0% found this document useful (0 votes)
95 views43 pages

How Generative AI Changes Information Discovery - 9thmay2024

Uploaded by

Luran Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views43 pages

How Generative AI Changes Information Discovery - 9thmay2024

Uploaded by

Luran Zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

ENABLING DISCOVERY | POWERING EDUCATION | SHAPING WORKFORCES

How Generative AI Changes


Information Discovery

Enabling a Deeper, Wider, Faster and More Personalized


Information Discovery

Dr. Hong Zhou


Director of Intelligent Services Group
& AI R&D, Wiley Partner Solutions
Presentation Overview

Context

• Digital Transformation with


ABC AI for Researchers
• AI is an inevitable trend
• From Content to • Content Generation
Knowledge Future thoughts
• Information Discovery
• Content Accessibility
• The Future of AI in
Scholarly Publishing

2
Wiley Green External 2021-10.potx
Digital Transformation: AI + Big Data + Cloud (ABC)
Goal: To Facilitate People (not replace them)

Automation Intelligence Connection


Newsletter Newsletter Newsletter

Social Media
Improved
Improved Data-Informed Discoverability,
Efficiency Decision Making Readability and
& Reduced cost & Accuracy Accessibility

3
Wiley Green External 2021-10.potx
AI Development is an Inevitable Trend

Context:
• From Information Age to Intelligent Age
• Global Digital Transformation

3 key elements for AI:


• Big Data: 126 Million Fold Increased
• Computing Power: 100 Million Fold Increased
• AI Models: 1 Billion Fold Increased

4
Wiley Green Mac 2021-10.potx
From a Content to a Knowledge

Content Knowledge

Aggregate Enrich

Link
Name UserId PubId PageId ActionTime
1 251819348 41518104 1033 48:21.3
2 298382030 41219686 1033 48:22.4
3 298382032 41415282 1033 48:25.0

5
Wiley Green External 2021-10.potx
Information Discovery Movement

Web Movement Information discovery movement

Web 1.0 Web 2.0 Web 3.0

Strongly read-
Typically read-only Read-write-interact
write

Owned content Shared content Consolidated content

Web service
Web page Data space
endpoint

HTML/HTTP/URL/Por
XML/RSS RDF/RDFS/OWL
tals

Page views Cost per click User engagement

Focus on the Focus on the Focus on the


company community individual

https://www.researchgate.net/publication/228503784_Challenges_and_Reflections_on_Information_Knowledge_and_Wisdom_Societies_Sociotechnical_Systems

6
Wiley Green External 2021-10.potx
How ChatGPT is Related to Other AI Concepts

Now Future Goal


Artificial Narrow Intelligence (ANI)
AI Generated Content (AIGC)
Output produced by generative AI in the
form of text, images, video, music, code… Artificial General
Intelligence(AGI)

A type of AI that possesses


Generative AI such as ChatGPT, Bard, Other Applications
general intelligence which
DALLE 2 based on LLMs such as
is similar to human
Utilize LLMs as the underlying tech and train Language translation,
intelligence
and finetune with more data to create Sentiment analysis,
content such as text, images, videos Information discovery…
Large Language Models(LLMs) such as GPT-X, BERT
Trained on vast amounts of text data and can understand human
language and used for content generation, translation, summarization,
classification, Q&A and more

7
Wiley Green Mac 2021-10.potx
Challenges & Problems
Researchers:
- Experiencing an overwhelming amount of
information.
- Difficulty in locating pertinent data and
information.
- Struggling to formulate precise inquiries.
- Staying current with new information.
- Ensuring the reliability of information
retrieval sources.
- Processing vast quantities of information.

Publishers:
- Enhance search experience on their
platform.
- Enhance users engagement on their
websites.
- Increase content discoverability.
- Filter out improper content
Information discovery on the research journey

Direction & Literature Hypothesis Experiment & Publication


Focus area Review Formation Conclusion & Feedback

• The vastness of • How to find the most • Hard to frame a • Any method that fits • Which journal
potential research relevant and best hypothesis that’s both the research should
areas literature novel and feasible to questions and the researchers
• Opportunity for long- • How to read and digest investigate available resources submit to
term relevance or vast amounts of info • Repeatedly refining or • Have enough good • Who can help
career prospects quickly even discarding quality data researchers revie
• Who to collaborate with • How to deal with • Right analytical tools w their work
contradicting and correct
studies/findings interpretation of
• How much can I trust results? Especially
these work? Any for unexpected
papermill or retraction results

9
Information discovery on the publishing journey

Authoring Submission Publishing Discovery &


& Review Dissemination

• Where to submit • Any research • Better content • Who are the right
• Who to work with misconduct? accessibility, audience/users
• What are good • Who reviews the readability and • ASEO
references submission discoverability • Faster, broader, deeper
• Discover knowledge and more personalized
instead of content discovery

Information discovery is more than just search & recommendation

10
AI powered tools in the publishing workflow

Authoring Submission Production Publishing Discovery &


& Review Dissemination

Wiley Research Exchange

11
Wiley Green Mac 2021-10.potx
Content Generation
Research Topic Suggestion
Ask ChatGPT/Gemini to suggest research topics/directions for PhD study and for journal/SI development
Prompt: “I am a PhD student and focus on NLP area. I am really interested in large language models currently. Can you suggest me a good topic for
PhD research and paper writing” and “I am a journal editor in computer science, I would like to create a new journal or special issue, please suggest
some important and popular topics for the new journal and special issue to have enough submission to make the journal or special issue successful”

ChatGPT gives a good topic and detailed


potential subtopics with clear objectives and
methodologies based on its limited knowledge
base to help students to approach this.

Gemini gives more emerging and up to date


ideas with detailed research questions to be
solved and gives additional resources from the
internet 13
Wiley Green Mac 2021-10.potx
Title & Abstract Generation
ChatGPT can generate any length of abstract and title with logical explanation

Title: Embracing Generative AI in Business, Abstract: Generative AI is revolutionizing


Research, and Society information discovery, domain-specific applications,
and task-specific solutions, while fostering
Explanation: “This title reflects the key points advancements in human-machine interaction. As AI
discussed throughout the entire text, highlighting becomes more accessible, it's crucial to balance its
the impact of generative AI and large language
benefits with ethical and governance considerations,
addressing privacy concerns and regulations.
models in various aspects of our lives, from business
Personal thoughts on the Intelligence Revolution
strategies
.
and applications to technology, ethics, and
emphasize the potential of AI in education, research,
human-machine interaction. The title also
and career development, as well as the need for
emphasizes how the rapid evolution of AI is
maintaining research integrity and managing AI trust,
revolutionizing information discovery, education,
risk, and security. The rapid development of AI
research integrity, and AI research and technology presents both opportunities and
development, while stressing the importance of challenges that individuals and organizations must
trust, risk, and security management in AI systems.” adapt to and address for a successful transition into
an AI-driven future.

14
Wiley Green Mac 2021-10.potx
Content Summarization
Extractive Summary: Effects of Vitamin D on Endometriosis-Related Pain: A Double-Blind Clinical Trial
• Before laparoscopy, the mean pelvic pain score in the vitamin D group was 4.05±3.45 and 4.82±4.1(p=0.513) in the placebo group. Before laparoscopy, the mean dysmenorrhea pain score in the vitamin D group was
7.37±2.61 and in placebo group it was 6.42±3.04 (p=0.325).
• Table 2 shows a comparison between the 2 groups for severity of pelvic pain and/or dysmenorrhea at different time points (before laparoscopy, in second menses after laparoscopy, and at 24 weeks after laparoscopy).
At the second menses after laparoscopy, there was no significant difference between the 2 groups for pelvic pain (p=0.583) and dysmenorrhea (p=0.365), and at 24 weeks after laparoscopy there was no significant
difference between mean pain scores in the 2 groups. Mean pelvic pain at 24 weeks after laparoscopy in the vitamin D group was 0.84±1.74 and in placebo group it was 0.68±1.70 (p=0.513).
• We explored the relationship between vitamin D and endometriosis in a double-blind, randomized clinical trial looking at the effect of vitamin D supplementation on cessation of pain in proven endometriosis after
laparoscopic diagnosis and treatment.
• There may be a relationship between vitamin D and pathogenesis of endometriosis, but in our study vitamin D was not effective in treatment of endometriosis-related pain.
• In this double-blind, randomized clinical trial, at 24 weeks after laparoscopic treatment of endometriosis there was no significant difference between effect of vitamin D3 (cholecalciferol) and placebo on severity of
dysmenorrhea and/or pelvic pain.
• The remaining 39 cases were randomly assigned in vitamin D (n=19) or placebo treatment (n=20) groups.
• After authorization by the university Ethics Committee, eligible patients were assigned by simple randomization to receive either vitamin D or placebo. In the vitamin D group (D group), we prescribed oral vitamin D 50
000 iu/weekly for 12 weeks (capsule D-Vigel, vitamin D3 50 000 iu, Daana Pharma Co. Tabriz-Iran) and in the placebo group (P group) we prescribed 1 capsule of placebo (Daana Pharma Co. Tabriz-Iran) weekly for 12
weeks.
• Mean dysmenorrhea was 2.10±2.33 in the vitamin D group and 2.73±2.84 in the placebo group (p=0.45).

ChatGPT Summary Extraction + ChatGPT Summary


This paper presents the findings of a double-blind, randomized clinical trial that aimed to explore the
This paper presents the results of a double-blind, randomized clinical trial that investigated the relationship between vitamin D supplementation and endometriosis-related pain. The study focused on the
effect of vitamin D supplementation on the cessation of pain in women with endometriosis. The effect of vitamin D on the cessation of pain in individuals with proven endometriosis after laparoscopic
study involved 39 women with proven endometriosis who were randomly assigned to receive diagnosis and treatment. A total of 39 eligible patients were randomly assigned to either the vitamin D
either vitamin D or placebo treatment. The severity of dysmenorrhea (menstrual pain) and pelvic group (n=19) or the placebo group (n=20). The study measured the severity of pelvic pain and
pain was assessed using a visual analogue scale (VAS) test before laparoscopy, at the second dysmenorrhea before laparoscopy in both groups, finding no significant difference between the two groups.
menstrual cycle after laparoscopy, and at 24 weeks after laparoscopy. The results showed that Simila results were observed at the second menses after laparoscopy and at 24 weeks after laparoscopy,
there was no significant difference in pain severity between the vitamin D and placebo groups at indicating that vitamin D supplementation did not have a significant impact on pain reduction compared to
24 weeks after laparoscopy. This study is the first randomized clinical trial to explore the potential the placebo group. Mean pain scores for pelvic pain and dysmenorrhea at 24 weeks after laparoscopy were
relationship between vitamin D treatment and relief of endometriosis-related pain. However, it comparable between the vitamin D and placebo groups. Specifically, the mean pelvic pain score was
should be noted that the sample size was small, and the study did not measure the participants' 0.84±1.74 in the vitamin D group and 0.68±1.70 in the placebo group (p=0.513). Furthermore, the mean
vitamin D levels before intervention. Therefore, the authors suggest that the observed lack of dysmenorrhea score was 2.10±2.33 in the vitamin D group and 2.73±2.84 in the placebo group (p=0.45).
significant difference in pain severity could be attributed to the possibility that the prescribed dose Although there is a potential relationship between vitamin D and the pathogenesis of endometriosis, the
and duration of vitamin D supplementation were only beneficial for treating vitamin D deficiency results of this study indicate that vitamin D supplementation was not effective in treating endometriosis-
rather than endometriosis itself. related pain. Therefore, the findings suggest that alternative treatment approaches should be considered
for managing pain in individuals with endometriosis.

ChatGPT produces better and more fluent summary than Gemini. But it is much more expensive and generates less detail than the extractive summary

15
Wiley Green Mac 2021-10.potx
Deeper Information Discovery

Challenge Solution

Key Entity Identification &


Information Overload
Extraction

Structured Knowledge
Embedded in Unstructured Knowledge Mining & Search
Text

16
Wiley Green External 2021-10.potx
Specific Entities Mining
Knowledge Mining takes valuable information from customers’ existing content to create a more structured
layout and generate new business opportunities.
Existing Content Structured Data
Inogatran molecular weight 439 Da

# Compound Property Value Unit

1
PS-b-P4VP molecular weight 59 000 g mol-1
2

3
What: Generate new content bundle with new
topics, new content types, new information
cross languages

heat of evaporation diamond 170 kcal/mol

260K triples from 1millon chemistry articles 17


Wiley Green External 2021-10.potx
From Content to Knowledge

18
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2820%2930304-4/fulltext
Wiley Green External 2021-10.potx
Personalized and Richer Information Discovery

Challenge Solution

Open Science Movement Multimedia Search


with Rich Data
Personalized
Recommendation
Researchers Must Find
Answers by Themselves from
a Long List of Search Results Natural Language Q&A

19
Wiley Green External 2021-10.potx
Where to Submit – Journal Suggestion
ChatGPT and Journal Finder give some common suggestions. But dedicated Journal Finder gives the correct answer.
ChatGPT and Google Gemini recommend related top-tier journals only.​ But Gemini can also give relevance scores.

This paper is published in


The New England Journal of Medicine

.
ChatGPT vs Journal Finder

20
Wiley Green Mac 2021-10.potx
Multimedia Content Discovery
Multimedia content discovery includes topic, image, video and funder searches:

21
Wiley Green External 2021-10.potx
Recommendations & Personalized News Feeds
Recommendations is one of the most common applications of AI. It varies from suggesting relevant experts, to identifying interest groups
and enabling a personalized user experience based on individual interests.

22
Wiley Green External 2021-10.potx
Reviewer Suggestion
Journal editor asks AI to suggest some reviewers after giving paper title and abstract of as shown below
I am a journal editor in computer science area and I have following paper with title:{Bilinear joint learning of word and entity embeddings for Entity Linking} and abstract:{Entity Linking (EL) is the task of resolving mentions to
referential entities in a knowledge base, which facilitates applications such as information retrieval, question answering, and knowledge base population. In this paper, we propose a novel embedding method specifically designed
for EL. The proposed model jointly learns word and entity embeddings which are located in different distributed spaces, and a bilinear model is introduced to simulate the interaction between words and entities. We treat EL as a
ranking problem, and utilize a pairwise learning-to-rank framework with features constructed with learned embeddings as well as conventional EL features. Experimental results show the proposed model produces effective
embeddings which improve the performance of our EL algorithm. Our method yields the state-of-the-art performances on two benchmark datasets CoNLL and TAC-KBP 2010.}. Can you give me recommendations about reviewers
for this paper?
ChatGPT Gemini Our Own Reviewer Finder

Ours returns real relevant researchers with h-


Gemini returns the authors with photos and source. But index, ORCID, relevancy score and relevant
ChatGPT gives non exist reviewers
some affiliation info are incorrect publication

All Bing, Gemini and ChatGPT don’t perform well in this experiment and their results contain
serious issues. Dedicated reviewer suggestion service give more reliable results with richer info!
23
Wiley Green Mac 2021-10.potx
New Way of Search
Question: What is the latest study progress about large pre-trained language models?

For the usage of scholarly research-related cases, Bing gives a better result than the two others. ChatGPT limit their answers to their data
while Bing generates the result based on the web search and gives the related articles with links as well (which are real latest articles)

Google Gemini ChatGPT Microsoft Bing


Add search in GAI GAI native application Add GAI in search

Gemini currently understand ~133 Bing gives short answers with reliable
languages, and it can support image references and less hallucination
search

ChatGPT 3.5 use only pretrained knowledge vs. 24


ChatGPT 4 can browse internet Wiley Green Mac 2021-10.potx
Personalized Recommendations
Both ChatGPT and Gemini can provide personal
recommendations based on conversations/prompts.

But they sometimes make up answers as shown below

25
Wiley Green Mac 2021-10.potx
Wider Information Discovery

Challenge Solution

Classifications
Data Silos

Milestone Paper
Need for Cross-
Disciplinary Research Research Analytics

26
Wiley Green External 2021-10.potx
Content Classification
Topic 1
PUBLISHER-SPECIFIC
TAXONOMIES

Documents

PUBLIC Images
TAXONOMIES Topic 2
(MeSH, UN SDG)
AUTO
TAGGER Videos


GLOBAL
TAXONOMY News/Blog

27
Wiley Green External 2021-10.potx
Content Classification
ChatGPT can NOT tag content based on customer specific taxonomy and it often generates fake tags or IDs
“Evaluation of BNT162b2 Covid-19 Vaccine in Children Younger than 5 Years of Age
Although coronavirus disease 2019 (Covid-19) is generally mild in children younger than 5 years of age, severe disease, hospitalizations,
and post-acute effects, including multisystem inflammatory syndrome in children (MIS-C), can occur.1-3 In the United States, rates of
Covid-19−associated hospitalization among children younger than 5 years of age peaked at…”
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947923/

Atypon Taxonomy tags Dedicated MeSH Auto Tagger ChatGPT MeSH Tagging
1. Medicine: 0.98 1. COVID-19: 0.99 1. Vaccines - 0.95
2. Pediatrics: 0.95 2. Child, Preschool: 0.97 2. Coronavirus disease 2019- 0.95
3. Pandemic: 0.93 3. Pandemics: 0.93 3. Child – 0.90
4. Immunization: 0.93 4. Coronavirus Infections: 0.91 4. Clinical Trials, Phase I - 0.85
5. Clinical trial: 0.92 5. Infant: 0.89 5. Clinical Trials, Phase II - 0.85
6. Booster: 0.73 6. COVID-19 Vaccines: 0.87 6. Clinical Trials, Phase III - 0.85
7. Coronavirus disease 2019: 0.65 7. Viral Vaccines: 0.85 7. Immunogenicity, Vaccine – 0.80
8. Emergency use authorization: 0.6 8. Pneumonia, Viral: 0.83 8. Safety - 0.80
9. Hospitalization: 0.77 9. Efficacy - 0.80
10. SARS-CoV-2: 0.69 10. Placebos - 0.75

28
Wiley Green Mac 2021-10.potx
Milestone Work Identification

29
Wiley Green External 2021-10.potx
Research Analytics

30
Wiley Green External 2021-10.potx
New Scholarly Search Applications in era of GAI
Bibliography databases Conversational discovery
Scopus AI Dimensions Assistant ChatGPT Google Gemini New Bing Perlexity.ai

• Topic experts feature. • Dimensions research GPT • Improved scientific writing. • Interdisciplinary Research. • Based on OpenAI • GAI native search engine
• generates Concept Enterprise. • Enhanced research equity. • Organized and Methodical technology and integrated • Provide clean and simple
maps • Notifications each time • Utility in health care research Approach by handling with MS Bing search way to discover without
• The Foundational content generated is based (e.g., dataset analysis, code large dataset. engine Ad.
papers option lists on Dimensions data generation, literature • streamlines the process of • Provide more reliable • Allow people to select
high-impact Scopus • AI-driven summarization reviews, and drug discovery). identifying relevant answers with trackable different LLM models and
papers on any topic. • Cost savings and scholarly content. references knowledge base
• synthesizes abstracts documentation. • Multimodality. • Provide shorter answers • Suggest relevant questions
• Provides • Personalized learning in than ChatGPT and Google
the references used health care education. • Allow to search on
to build the • Citation Generation for internet or local
summary. Research Papers. knowledge base

Commerical and researcher facing search engines


Elicit Scite SciSpace Consensus

• Leverages advanced LLMs • Provides summary for the top 10 relevant • Provides high-level insights from top five • leverage GPT4 and other LLMs
• Ability to ask questions within a results papers, with detailed summaries and to summarize results for you.
specific result content • Q&A feature that runs through full-text insights • Designed to answer scientific
• Provides summary for the top X content • Integrated with Copilot AI assistant to chat research questions
relevant results • Availability of chat history with your paper by asking following up
• Concept search. • Availability to configure assistant settings questions

31
Wiley Green External 2021-10.potx
Content Accessibility
Different Branches of AI

Speech Recognition Hearing Loss; Limited Movement

Computer Vision Vision Impairment; Cognitive Learning

Information Discovery Cognitive Learning; Limited Movement

Knowledge Graph Cognitive Learning

Natural Language Processing Cognitive Learning; Error Prevention

33
Wiley Green External 2021-10.potx
Alt Text for Image Example
Alt text (alternative text) describes the appearance or function of an image on a page for screen
reader & SEO.
• ChatGPT: The image depicts a serene winter scene, likely captured
during the early morning based on the soft lighting. It shows several
trees covered with thick, white snow, standing prominently against a
backdrop of a dense pine forest. The foreground features a wooden
fence partially buried under the snow, stretching across a field also
blanketed in snow. The entire scene conveys a tranquil, almost
untouched winter landscape, highlighting the beauty of a heavy
snowfall.

• Gemini: The image shows a row of trees covered in snow next to a


fence. The trees are all different shapes and sizes, and they are all
white from the snow. The fence is made of wood and metal.

Gemini can find similar


images from internet

34
Wiley Green External 2021-10.potx
Content to Audio Example

Microsoft, Google and Samsung all provides free apps that assist people with impaired vision by
narrating surroundings.

• Microsoft Seeing-AI has already assisted people with


> 20 million tasks

Virtual assistant powered by GPT-4 can


do more

35
Wiley Green External 2021-10.potx
Compound Figure Segmentation Example
Antagonism

cADPR

a. NAADP compound b. NAADP IP3 c. Calcium release


d. NAADP Log Ned-19 e. NAADP Log Ned-20

f. NAADP bound g. Control Ned-19 h. cis-Ned-19

i. Log compound M j. Percent control bound

36
Wiley Green External 2021-10.potx
Image Quality Enhancement Examples
AI can automatically increase contrast in images. AI can automatically enhance image resolution and
quality to improve readability.

37
Wiley Green External 2021-10.potx
Speech-to-Text Example
AI can listen to a video or sound file and then transcribe the spoken words into text.

Accurate: ~90% transcription accuracy


Fast: Processes 30 seconds of audio in 12 seconds

38
Wiley Green External 2021-10.potx
Solution changes

Neural search Agent


• Index creation • External knowledge • Fast finetuning with
• Query processing • Vector index latest knowledge
• Ranking & scoring • Natural Language • LLMs • Reactive • Large context input
• Inverted index processing or Proactive agents. • On demand answer
• Semantic meaning • Fixed or generation
• Learn to rank Dynamic environment. •Rich input & output
• Multi-agent systems
Keyword search
Real GAI
& Boolean RAG
"Generative AI"
operators

AI GAI

39
Wiley Green External 2021-10.potx
AI Revolutionize information
discovery & Interaction

Results Wider Deeper Richer

User Accurate Personalized Intelligent


experience

Interaction Text query Multimedia Conversational


Risks & Limitation

Generation Dissemination Consumption

• Hallucination • Blackbox • Data privacy


• Unclear solution & security
ownership • Information • AI trust
• Insufficient cocoon • IP issue
audit • Information • Copyright
capability bias

41
Wiley Green External 2021-10.potx
Future Thoughts

 Authorship & Ownership Change of Content in Web 3.0


 Embracing AI Collaboration
 Defining the right problem first
 Multimodal AI: Richer inputs and outputs including
voice, image, videos for information discovery
 Rethink of business objectives and research metrics
 Personal Research Assistants
 Upholding Research Integrity
 AI Governance
 Don’t fully rely on GAI
Questions?
Thank you!

Hong's Posts in the Scholarly Kitchen

You might also like