Knowledge Representation and Reasoning
Knowledge Representation and Reasoning
* [**overview**](#overview)
* [**knowledge representation**](#knowledge-representation)
- [**natural language**](#knowledge-representation---natural-language)
- [**knowledge graph**](#knowledge-representation---knowledge-graph)
- [**probabilistic database**](#knowledge-representation---probabilistic-
database)
- [**probabilistic program**](#knowledge-representation---probabilistic-
program)
- [**causal graph**](#knowledge-representation---causal-graph)
- [**distributed representation**](#knowledge-representation---
distributed-representation)
* [**reasoning**](#reasoning)
- [**natural logic**](#reasoning---natural-logic)
- [**formal logic**](#reasoning---formal-logic)
- [**bayesian reasoning**](#reasoning---bayesian-reasoning)
- [**causal reasoning**](#reasoning---causal-reasoning)
- [**neural reasoning**](#reasoning---neural-reasoning)
* [**interesting papers**](#interesting-papers)
- [**knowledge bases**](#interesting-papers---knowledge-bases)
- [**reasoning**](#interesting-papers---reasoning)
---
["What Is a Knowledge
Representation"](https://aaai.org/ojs/index.php/aimagazine/article/view/
1029) by Davis, Shrobe, Szolovits `paper`
<https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning>
----
["Knowledge Representation and Reasoning"](https://goo.gl/JV1JhZ) by
Brachman and Levesque `book`
["Handbook of Knowledge
Representation"](http://dai.fmph.uniba.sk/~sefranek/kri/handbook/) by van
Harmelen, Lifschitz, Porter `book`
----
----
- [**natural language**](#knowledge-representation---natural-language)
- [**knowledge graph**](#knowledge-representation---knowledge-graph)
* [**ontology**](#knowledge-graph---ontology)
* [**relational learning**](#knowledge-graph---relational-learning)
- [**probabilistic database**](#knowledge-representation---probabilistic-
database)
* [**BayesDB**](#probabilistic-database---bayesdb)
* [**Epistemological Database**](#probabilistic-database---
epistemological-database)
* [**ProPPR**](#probabilistic-database---proppr)
- [**probabilistic program**](#knowledge-representation---probabilistic-
program)
- [**causal graph**](#knowledge-representation---causal-graph)
- [**distributed representation**](#knowledge-representation---distributed-
representation)
----
interesting papers:
- [**knowledge bases**](#interesting-papers---knowledge-bases)
---
[**Natural Language
Processing**](https://github.com/brylevkirill/notes/blob/master/Natural
%20Language%20Processing.md#overview)
----
- [**ontology**](#knowledge-graph---ontology)
- [**relational learning**](#knowledge-graph---relational-learning)
----
---
[**RDF**](https://github.com/brylevkirill/tech/blob/master/RDF/RDF.txt)
(Resource Description Framework) `summary`
----
[**OWL**](https://github.com/brylevkirill/tech/blob/master/RDF/OWL.txt)
(Web Ontology Language) `summary`
["OWL: The Web Ontology Language"](https://youtube.com/watch?
v=EXXIIlfqb0c) by Pavel Klinov `video`
----
[**schema.org**](https://github.com/brylevkirill/tech/blob/master/RDF/
schema.org.txt) `summary`
> "We report some key schema.org adoption metrics from a sample of 10
billion pages from a combination of the Google index and Web Data
Commons. In this sample, 31.3% of pages have schema.org markup, up from
22% one year ago. Structured data markup is now a core part of the modern
web."
> "RDF on the web marked with schema.org types is the largest existing
structured knowledge base."
> "I don't think we even have the begginings of theory [entity resolution,
graph reconciliation]."
----
---
[overview](https://soundcloud.com/nlp-highlights/83-knowledge-base-
construction-with-sebastian-riedel) by Sebastian Riedel `audio`
----
- [AKBC 2019
(videos)](https://youtube.com/channel/UCzKZf82vIuI8uMazyL0LIvQ/videos)
- [AKBC 2017](http://akbc.ws/2017/)
- [AKBC 2016](http://akbc.ws/2016/)
- [AKBC 2014
(videos)](http://youtube.com/user/NeuralInformationPro/search?query=AKBC)
----
[**interesting papers**](#interesting-papers---information-extraction-and-
integration)
---
- data = matrix
relational learning:
problems:
Concerned with models of domains that exhibit both uncertainty (which can
be dealt with using statistical methods) and complex, relational structure.
Combines machine learning with relational data models and first-order logic
and enables machine learning in knowledge bases.
Knowledge representation formalisms use (a subset of) first-order logic to
describe relational properties of a domain in a general manner (universal
quantification) and draw upon probabilistic graphical models (such as
Bayesian networks or Markov networks) to model the uncertainty.
----
[course](https://youtube.com/playlist?
list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn) by Jure Leskovec *(lectures
10 and 11)* `video`
----
["Statistical Relational
Learning"](http://videolectures.net/mlpmsummerschool2014_tresp_statistical
_learning) tutorial by Tresp `video`
----
- [**tensor factorization**](#relational-learning---tensor-factorization)
- [**embedding models**](#relational-learning---embedding-models)
(used in [**NELL**](#machine-reading-projects---nell))
(used in [**ProPPR**](#probabilistic-database---proppr))
["Efficient Inference and Learning in a Large Knowledge Base:
Reasoning with Extracted Information using a Locally Groundable First-Order
Probabilistic Logic"](#efficient-inference-and-learning-in-a-large-knowledge-
base-reasoning-with-extracted-information-using-a-locally-groundable-first-
order-probabilistic-logic-wang-mazaitis-lao-mitchell-cohen) by Wang et al.
`paper` `summary` ([talk](http://youtu.be/--pYaISROqE?t=12m35s) `video`)
----
-
[overview](http://videolectures.net/mlpmsummerschool2014_tresp_statistical
_learning/) (part 2, 1:19:37) by Volker Tresp `video`
-
[overview](http://videolectures.net/kdd2014_gabrilovich_bordes_knowledge_
graphs/) (part 2, 1:07:43) by Antoine Bordes `video`
[graph
embeddings](https://gist.github.com/mommi84/07f7c044fa18aaaa7b513323
0207d8d4)
[**embeddings of natural
language**](https://github.com/brylevkirill/notes/blob/master/Natural
%20Language%20Processing.md#embeddings)
----
with latent features one gets collective learning - information can globally
propagate in the network of random variables
[overview](http://videolectures.net/kdd2014_gabrilovich_bordes_knowledge_
graphs) (part 2, 1:00:12) by Evgeniy Gabrilovich `video`
----
([talk](http://techtalks.tv/talks/injecting-logical-background-knowledge-
into-embeddings-for-relation-extraction/61526/) `video`)
([slides](http://www.dbs.ifi.lmu.de/~tresp/papers/ESWC-Keynote.pdf))
----
----
----
----
- in MLN one can use any FOL formula, not just the ones derived from bayes
nets
---
["Probabilistic
Databases"](http://www.dblab.ntua.gr/~gtsat/collection/Morgan
%20Claypool/Probabilistic%20Databases%20-%20Dan%20Suciu%20-
%20Morgan%20Clayman.pdf) by Suciu et al. `paper`
----
----
- [**BayesDB**](#probabilistic-database---bayesdb)
- [**Epistemological Database**](#probabilistic-database---epistemological-
database)
- [**ProPPR**](#probabilistic-database---proppr)
---
[BayesDB](http://probcomp.csail.mit.edu/software/bayesdb/) project
[overview](https://youtube.com/watch?v=-8QMqSWU76Q) by Vikash
Mansinghka `video`
[overview](https://youtu.be/Rte-y6ThwAQ?t=17m33s) by Vikash
Mansinghka `video`
---
----
epistemological database:
problems:
---
----
> "Most subfields of computer science have an interface layer via which
applications communicate with the infrastructure, and this is key to their
success (e.g., the Internet in networking, the relational model in databases,
etc.). So far this interface layer has been missing in AI. First-order logic and
probabilistic graphical models each have some of the necessary features, but
a viable interface layer requires combining both. Markov logic is a powerful
new language that accomplishes this by attaching weights to first-order
formulas and treating them as templates for features of Markov random
fields. Most statistical models in wide use are special cases of Markov logic,
and first-order logic is its infinite-weight limit. Inference algorithms for
Markov logic combine ideas from satisfiability, Markov chain Monte Carlo,
belief propagation, and resolution. Learning algorithms make use of
conditional likelihood, convex optimization, and inductive logic
programming."
["Markov Logic"](http://homes.cs.washington.edu/~pedrod/papers/pilp.pdf)
by Domingos et al. `paper`
----
["Markov Logic Networks for Natural Language Question Answering"]
(#markov-logic-networks-for-natural-language-question-answering-khot-
balasubramanian-gribkoff-sabharwal-clark-etzioni) by Khot et al. `paper`
`summary`
----
"Markov Logic Networks use weighted first order logic formulas to construct
probabilistic models, and specify distributions over possible worlds. Markov
Logic Networks can also be encoded as probabilistic programs."
----
The example states that if someone smokes, there is a chance that they get
cancer, and the smoking behaviour of friends is usually similar. Markov logic
uses such weighted rules to derive a probability distribution over possible
worlds through an undirected graphical model. This probability distribution
over possible worlds is then used to draw inferences. Weighting the rules is a
way of softening them compared to hard logical constraints and thereby
allowing situations in which not all clauses are satisfied.
MLNs take as input a set of weighted first-order formulas F = F1, ..., Fn.
They then compute a set of ground literals by grounding all predicates
occurring in F with all possible constants in the system. Next, they define a
probability distribution over possible worlds, where a world is a truth
assignment to the set of all ground literals. The probability of a world
depends on the weights of the input formulas F as follows: The probability of
a world increases exponentially with the total weight of the ground clauses
that it satisfies. The probability of a given world x is defined as:
P(X = x) = 1/Z * exp(Sum over i(wi * ni(x)))
"MLNs make the Domain Closure Assumption: The only models considered
for a set F of formulas are those for which the following three conditions hold.
(a) Different constants refer to different objects in the domain, (b) the only
objects in the domain are those that can be represented using the constant
and function symbols in F, and (c) for each function f appearing in F, the
value of f applied to every possible tuple of arguments is known, and is a
constant appearing in F. Together, these three conditions entail that there is
a one-to-one relation between objects in the domain and the named
constants of F. When the set of all constants is known, it can be used to
ground predicates to generate the set of all ground literals, which then
become the nodes in the graphical model. Different constant sets result in
different graphical models. If no constants are explicitly introduced, the
graphical model is empty (no random variables)."
----
- larger weight indicates stronger belief that the clause should hold
- MLNs are templates for constructing Markov networks for a given set of
constants
- possible world becomes exponentially less likely as the total weight of all
the grounded clauses it violates increases
advantages:
----
----
implementations:
- [Alchemy](http://alchemy.cs.washington.edu)
- [Alchemy Lite](http://alchemy.cs.washington.edu/lite/)
- [Tuffy](http://i.stanford.edu/hazy/hazy/tuffy/)
([paper](http://cs.stanford.edu/people/chrismre/papers/tuffy.pdf))
- [TuffyLite](https://github.com/allenai/tuffylite)
- [LoMRF](https://github.com/anskarl/LoMRF)
- [RockIt](https://code.google.com/p/rockit/)
- [thebeast](https://code.google.com/p/thebeast/)
- [ProbKB](http://dsr.cise.ufl.edu/tag/probkb/)
([paper](https://cise.ufl.edu/~yang/doc/slg2013.pdf))
---
<http://psl.linqs.org>
<https://github.com/linqs/psl>
[overview](http://facebook.com/nipsfoundation/videos/1554329184658315/)
by Lise Getoor `video`
----
["Hinge-Loss Markov Random Fields and Probabilistic Soft Logic"](#hinge-
loss-markov-random-fields-and-probabilistic-soft-logic-bach-broecheler-
huang-getoor) by Bach, Broecheler, Huang, Getoor `paper` `summary`
----
"PSL uses first order logic rules as a template language for graphical
models over random variables with soft truth values from the interval [0, 1].
This allows one to directly incorporate similarity functions, both on the level
of individuals and on the level of sets. For instance, when modeling opinions
in social networks, PSL allows one to not only model different types of
relations between users, such as friendship or family relations, but also
multiple notions of similarity, for instance based on hobbies, beliefs, or
opinions on specific topics. Technically, PSL represents the domain of interest
as logical atoms. It uses first order logic rules to capture the dependency
structure of the domain, based on which it builds a joint probabilistic model
over all atoms. Each rule has an associated non-negative weight that
captures the rule’s relative importance. Due to the use of soft truth values,
inference (most probable explanation and marginal inference) in PSL is a
continuous optimization problem, which can be solved efficiently."
- input: set of weighted FOPL rules and a set of evidence (just as in MLN)
- atoms have continuous truth values in [0, 1] (in MLN atoms have boolean
values {0, 1})
- inference finds truth value of all atoms that best satisfy the rules and
evidence (in MLN inference finds probability of atoms given the rules and
evidence)
<https://github.com/TeamCohen/ProPPR>
----
----
- consequence: inference is fast, even for large KBs, and parameter learning
can be parallelized
- parameter learning improves from hours to seconds, and scales KBs from
thousands of entities to millions of entities
---
[**probabilistic
programming**](https://github.com/brylevkirill/notes/blob/master/Probabilisti
c%20Programming.md)
----
[**reasoning - bayesian reasoning**](#reasoning---bayesian-reasoning)
---
[**causal
inference**](https://github.com/brylevkirill/notes/blob/master/Causal
%20Inference.md)
----
---
[**distributed
representation**](https://github.com/brylevkirill/notes/blob/master/Deep
%20Learning.md#architectures---distributed-representation)
[**natural language
embeddings**](https://github.com/brylevkirill/notes/blob/master/Natural
%20Language%20Processing.md#embeddings)
----
[**reasoning - neural reasoning**](#reasoning---neural-reasoning)
---
### reasoning
<https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning>
----
["Handbook of Knowledge
Representation"](http://dai.fmph.uniba.sk/~sefranek/kri/handbook/) by van
Harmelen, Lifschitz, Porter `book`
----
- [**natural logic**](#reasoning---natural-logic)
- [**formal logic**](#reasoning---formal-logic)
- [**bayesian reasoning**](#reasoning---bayesian-reasoning)
- [**causal reasoning**](#reasoning---causal-reasoning)
- [**neural reasoning**](#reasoning---neural-reasoning)
----
interesting papers:
- [**reasoning**](#interesting-papers---reasoning)
[**interesting recent
papers**](https://github.com/brylevkirill/notes/blob/master/interesting
%20recent%20papers.md#reasoning)
---
----
----
---
----
----
[*(Josef Urban)*](https://intelligence.org/2013/12/21/josef-urban-on-
machine-learning-and-automated-reasoning/)
----
[overview](https://youtube.com/watch?v=ehNGGYFO6ms) of
autoformalization by Christian Szegedy `video`
[overview](https://youtu.be/p_UXra-_ORQ?t=14m48s) of autoformalization
by Christian Szegedy `video`
---
[**Solomonoff
induction**](https://github.com/brylevkirill/notes/blob/master/Artificial
%20Intelligence.md#universal-artificial-intelligence---solomonoff-induction)
[**active
inference**](https://github.com/brylevkirill/notes/blob/master/Artificial
%20Intelligence.md#active-inference)
----
----
----
----
*(Josh Tenenbaum)*
---
[**causal
inference**](https://github.com/brylevkirill/notes/blob/master/Causal
%20Inference.md)
----
---
----
----
[overview](https://youtube.com/watch?v=UAa2o0W7vcg) of differences
between pattern matching and abstract modeling + reasoning by Francois
Chollet `video`
["Cognitive
Architectures"](https://machinethoughts.wordpress.com/2016/06/20/cognitiv
e-architectures/) by David McAllester
----
> * can incorporate explanations via rules leading to more data efficient
training
----
([talk](http://techtalks.tv/talks/injecting-logical-background-knowledge-
into-embeddings-for-relation-extraction/61526/) `video`)
[**interesting papers**](#interesting-papers---reasoning)
----
"One of the main benefits in using neural networks is that they can be
trained to handle very subtle kinds of logic that humans use in casual
language that defy axiomatization. Propositional logic, first-order logic,
higher-order logic, modal logic, nonmonotonic logic, probabilistic logic, fuzzy
logic - none of them seem to quite be adequate; but if you use the right kind
of recursive net, you don't even have to specify logic to get it to make useful
deductions, if you have enough training data."
"Many machine reading approaches map natural language to symbolic
representations of meaning. Representations such as first-order logic capture
the richness of natural language and support complex reasoning, but often
fail due to their reliance on logical background knowledge and difficulty of
scaling up inference. In contrast, distributed representations are efficient and
enable generalization, but it is unclear how reasoning with embeddings could
support full power of symbolic representations."
---
["Moving Beyond the Turing Test with the Allen AI Science Challenge"]
(https://arxiv.org/abs/1604.04315) by Schoenick et al. `paper`
([slides](http://akbc.ws/2016/slides/etzioni-akbc16.pptx))
- [*Winograd Schema
Challenge*](http://commonsensereasoning.org/winograd.html)
[results in 2016](http://whatsnext.nuance.com/in-the-labs/winograd-
schema-challenge-2016-results/)
set of schemas - pairs of sentences that differ only in one or two words
and that contain an ambiguity
if the word is "feared" then "they" presumably refers to the city council
- Google-proof (no obvious statistical test over text corpora that will
reliably disambiguate these correctly)
answer to each schema is:
- [*commonsense
reasoning*](http://commonsensereasoning.org/problem_page.html)
---
- [**IBM Watson**](#machine-reading-projects---ibm-watson)
- [**AI2 Aristo**](#machine-reading-projects---ai2-aristo)
- [**Fonduer**](#machine-reading-projects---fonduer)
- [**DeepDive**](#machine-reading-projects---deepdive)
- [**NELL**](#machine-reading-projects---nell)
---
----
["Building Watson: An Overview of the DeepQA
Project"](https://aaai.org/ojs/index.php/aimagazine/article/view/2303) by
Ferrucci et al. `paper`
[papers](https://dropbox.com/sh/udz1kpzzz95xfd6/AADgpBmFsTS1CtkbClfmb
yyqa) from IBM Watson team
---
----
----
["Markov Logic Networks for Natural Language Question Answering"]
(#markov-logic-networks-for-natural-language-question-answering-khot-
balasubramanian-gribkoff-sabharwal-clark-etzioni) by Khot et al. `paper`
`summary`
----
"Elementary grade science tests are challenging as they test a wide variety
of commonsense knowledge that human beings largely take for granted, yet
are very difficult for machines. For example, consider a question from a NY
Regents 4th Grade science test. Question 1: “When a baby shakes a rattle, it
makes a noise. Which form of energy was changed to sound energy?”
[Answer: mechanical energy] Science questions are typically quite different
from the entity-centric factoid questions extensively studied in the question
answering community, e.g., “In which year was Bill Clinton born?” While
factoid questions are usually answerable from text search or fact databases,
science questions typically require deeper analysis. A full understanding of
the above question involves not just parsing and semantic interpretation; it
involves adding implicit information to create an overall picture of the
“scene” that the text is intended to convey, including facts such as: noise is
a kind of sound, the baby is holding the rattle, shaking involves movement,
the rattle is making the noise, movement involves mechanical energy, etc.
This mental ability to create a scene from partial information is at the heart
of natural language understanding, which is essential for answering these
kinds of question. It is also very difficult for a machine because it requires
substantial world knowledge, and there are often many ways a scene can be
elaborated."
"A first step towards a machine that contains large amounts of knowledge
in machine-computable form that can answer questions, explain those
answers, and discuss those answers with users. Central to the project is
machine reading - semi-automated acquisition of knowledge from natural
language texts. We are also integrating semi-formal methods for reasoning
with knowledge, such as textual entailment and evidential reasoning, and a
robust hybrid architecture that has multiple reasoning modules operating in
tandem."
---
----
---
[Fonduer](https://github.com/HazyResearch/fonduer) project
----
[overview](https://youtube.com/watch?v=VrGM5Qw5xpo) by Sen Wu
`video`
----
----
[DeepDive](http://deepdive.stanford.edu) project
----
[overview](http://videolectures.net/nipsworkshops2013_re_archaeological_te
xts/) by Chris Re `video`
----
["DeepDive: Design
Principles"](http://cs.stanford.edu/people/chrismre/papers/dd.pdf) by Chris Re
`paper`
[overview](http://deepdive.stanford.edu/kbc)
----
[**Markov Logic Network**](#probabilistic-database---markov-logic-
network) as knowledge representation
----
[showcases](http://deepdive.stanford.edu/showcase/apps)
[PaleoDB](http://nature.com/news/computers-read-the-fossil-record-
1.17868)
["Overview of the English Slot Filling Track at the TAC2014 Knowledge Base
Population Evaluation"](http://nlp.cs.rpi.edu/paper/sf2014overview.pdf) by
Surdeanu and Ji `paper`
----
"DeepDive takes a radical view for a data processing system: it views every
piece of data as an imperfect observation - which may or may not be correct.
It uses these observations and domain knowledge expressed by the user to
build a statistical model. One output of this massive model is the most likely
database. As aggressive as this approach may sound, it allows for a
dramatically simpler user interaction with the system than many of today’s
machine learning or extraction systems."
---
----
[overview](http://videolectures.net/nipsworkshops2013_taludkar_language_le
arning/) by Partha Talukdar `video`
----
["Never-Ending Learning"](#never-ending-learning-mitchell-et-al) by
Mitchell et al. `paper` `summary`
----
semi-supervised learning is harder for one concept than for many related
concepts together: hard (underconstrained) - much easier (more
constrained)
- promoting a set of beliefs consistent with the ontology and each other
---
- [**knowledge bases**](#interesting-papers---knowledge-bases)
- [**reasoning**](#interesting-papers---reasoning)
[**interesting recent
papers**](https://github.com/brylevkirill/notes/blob/master/interesting
%20recent%20papers.md#reasoning)
---
` `Microsoft Satori`
- `post` <https://devblogs.microsoft.com/dotnet/announcing-ml-net-0-6-
machine-learning-net>
- `video` <http://videolectures.net/kdd2014_murphy_knowledge_vault/>
(Murphy)
- `video`
<http://videolectures.net/kdd2014_gabrilovich_bordes_knowledge_graphs/>
(Gabrilovich)
- `slides` <http://cikm2013.org/slides/kevin.pdf>
`Fonduer`
- `post` <https://hazyresearch.github.io/snorkel/blog/fonduer.html>
- `code` <https://github.com/HazyResearch/fonduer>
- [**Snorkel**](https://github.com/brylevkirill/notes/blob/master/Machine
%20Learning.md#weak-supervision---data-programming) project
`DeepDive`
`DeepDive`
- `notes` <https://blog.acolyer.org/2016/10/07/incremental-knowledge-
base-construction-using-deepdive/>
#### ["Never-Ending
Learning"](https://aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/10049)
Mitchell et al.
`NELL`
> "Whereas people learn many different types of knowledge from diverse
experiences over many years, most current machine learning systems
acquire just a single function or data model from just a single data set. We
propose a never-ending learning paradigm for machine learning, to better
reflect the more ambitious and encompassing type of learning performed by
humans. As a case study, we describe the Never-Ending Language Learner,
which achieves some of the desired properties of a never-ending learner, and
we discuss lessons learned. NELL has been learning to read the web 24
hours/day since January 2010, and so far has acquired a knowledge base
with over 80 million confidence-weighted beliefs (e.g., servedWith(tea,
biscuits)), while learning continually to improve its reading competence over
time. NELL has also learned to reason over its knowledge base to infer new
beliefs from old ones, and is now beginning to extend its ontology by
synthesizing new relational predicates."
- `video`
<http://videolectures.net/akbcwekex2012_mitchell_language_learning/>
(Mitchell)
- `video`
<http://videolectures.net/nipsworkshops2013_taludkar_language_learning/>
(Talukdar)
`Yago`
> "We present YAGO3, an extension of the YAGO knowledge base that
combines the information from the Wikipedias in multiple languages. Our
technique fuses the multilingual information with the English WordNet to
build one coherent knowledge base. We make use of the categories, the
infoboxes, and Wikidata, and learn the meaning of infobox attributes across
languages. We run our method on 10 different languages, and achieve a
precision of 95%-100% in the attribute mapping. Our technique enlarges
YAGO by 1m new entities and 7m new facts."
#### ["Automatic Construction of Inference-Supporting Knowledge Bases"]
(http://akbc.ws/2014/submissions/akbc2014_submission_32.pdf) Clark et al.
`AI2`
- `slides`
<https://drive.google.com/file/d/0B_hicYJxvbiOd3pwZTNnaDRHdFU>
---
- `slides` <http://cs.cmu.edu/~nlao/publication/2012/defense.pdf>
- `code` <https://github.com/matt-gardner/pra>
> "We explore some of the practicalities of using random walk inference
methods, such as the Path Ranking Algorithm, for the task of knowledge base
completion. We show that the random walk probabilities computed (at great
expense) by PRA provide no discernible benefit to performance on this task,
and so they can safely be dropped. This result allows us to define a simpler
algorithm for generating feature matrices from graphs, which we call
subgraph feature extraction. In addition to being conceptually simpler than
PRA, SFE is much more efficient, reducing computation by an order of
magnitude, and more expressive, allowing for much richer features than just
paths between two nodes in a graph. We show experimentally that this
technique gives substantially better performance than PRA and its variants,
improving mean average precision from .432 to .528 on a knowledge base
completion task using the NELL knowledge base."
> "We have explored several practical issues that arise when using the
path ranking algorithm for knowledge base completion. An analysis of
several of these issues led us to propose a simpler algorithm, which we
called subgraph feature extraction, which characterizes the subgraph around
node pairs and extracts features from that subgraph. SFE is both significantly
faster and performs better than PRA on this task. We showed experimentally
that we can reduce running time by an order of magnitude, while at the
same time improving mean average precision from .432 to .528 and mean
reciprocal rank from .850 to .933. This thus constitutes the best published
results for knowledge base completion on NELL data."
> "As a final point of future work, we note that we have taken PRA, an
elegant model that has strong ties to logical inference, and reduced it with
SFE to feature extraction over graphs. It seems clear experimentally that this
is a significant improvement, but somehow doing feature engineering over
graphs is not incredibly satisfying. We introduced a few kinds of features with
SFE that we thought might work well, but there are many, many more kinds
of features we could have experimented with (e.g., counts of paths found,
path unigrams, path trigrams, conjunctions of simple paths, etc.). How
should we wade our way through this mess of feature engineering? This is
not a task we eagerly anticipate. One possible way around this is to turn to
deep learning, whose promise has always been to push the task of feature
engineering to the neural network. Some initial work on creating embeddings
of graphs has been done (Bruna et al., 2013), but that work dealt with
unlabeled graphs and would need significant modification to work in this
setting. The recursive neural network of Neelakantan et al. (2015) is also a
step in the right direction, though the spectral networks of Bruna et al. seem
closer to the necessary network structure here."
> "Over the past few years, Markov Logic Networks have emerged as a
powerful AI framework that combines statistical and logical reasoning. It has
been applied to a wide range of data management problems, such as
information extraction, ontology matching, and text mining, and has become
a core technology underlying several major AI projects. Because of its
growing popularity, MLNs are part of several research programs around the
world. None of these implementations, however, scale to large MLN data
sets. This lack of scalability is now a key bottleneck that prevents the
widespread application of MLNs to real-world data management problems. In
this paper we consider how to leverage RDBMSes to develop a solution to
this problem. We consider Alchemy, the state-of-the-art MLN implementation
currently in wide use. We first develop bTuffy, a system that implements
Alchemy in an RDBMS. We show that bTuffy already scales to much larger
datasets than Alchemy, but suffers from a sequential processing problem
(inherent in Alchemy). We then propose cTuffy that makes better use of the
RDBMS’s set-at-a-time processing ability. We show that this produces
dramatic benefits: on all four benchmarks cTuffy dominates both Alchemy
and bTuffy. Moreover, on the complex entity resolution benchmark cTuffy
finds a solution in minutes, while Alchemy spends hours unsuccessfully. We
summarize the lessons we learnt, on how we can design AI algorithms to
take advantage of RDBMSes, and extend RDBMSes to work better for AI
algorithms."
- `paper` ["Markov
Logic"](http://homes.cs.washington.edu/~pedrod/papers/pilp.pdf) by
Domingos et al.
> "We presented a novel inference algorithm for TPKBs that is disk-
based, parallel, and sublinear. We also derived closed-form maximum
likelihood estimates for TPKB parameters. We used these results to learn a
large TPKB from multiple data sources and applied it to information
extraction and integration problems. The TPKB outperformed existing
algorithms in accuracy and efficiency. Future work will be concerned with
more sophisticated smoothing approaches, the comparison of different
learning strategies, and the problem of structure learning. We also plan to
apply TPKBs to a wide range of problems that benefit from tractable
probabilistic knowledge representations."
- `code` <http://alchemy.cs.washington.edu/lite/>
> "One important challenge for probabilistic logics is reasoning with very
large knowledge bases of imperfect information, such as those produced by
modern web-scale information extraction systems. One scalability problem
shared by many probabilistic logics is that answering queries involves
“grounding” the query - i.e., mapping it to a propositional representation -
and the size of a “grounding” grows with database size. To address this
bottleneck, we present a first-order probabilistic language called ProPPR in
which that approximate “local groundings” can be constructed in time
independent of database size. Technically, ProPPR is an extension to
Stochastic Logic Programs that is biased towards short derivations; it is also
closely related to an earlier relational learning algorithm called the Path
Ranking Algorithm. We show that the problem of constructing proofs for this
logic is related to computation of Personalized PageRank on a linearized
version of the proof space, and using on this connection, we develop a
proveably-correct approximate grounding scheme, based on the PageRank-
Nibble algorithm. Building on this, we develop a fast and easily-parallelized
weight-learning algorithm for ProPPR. In experiments, we show that learning
for ProPPR is orders magnitude faster than learning for Markov Logic
Networks; that allowing mutual recursion (joint learning) in KB inference
leads to improvements in performance; and that ProPPR can learn weights
for a mutually recursive program with hundreds of clauses, which define
scores of interrelated predicates, over a KB containing one million entities."
- `code` <https://github.com/TeamCohen/ProPPR>
- [**ProPPR**](#probabilistic-database---proppr)
#### ["Structure Learning via Parameter
Learning"](https://www.cs.cmu.edu/~wcohen/postscript/cikm-2014-
structure.pdf) Wang, Mazaitis, Cohen
- `code` <https://github.com/TeamCohen/ProPPR>
- [**ProPPR**](#probabilistic-database---proppr)
---
[papers](https://github.com/thunlp/KRLPapers)
----
`Universal Schema`
> "Traditional relation extraction predicts relations within some fixed and
finite target schema. Machine learning approaches to this task require either
manual annotation or, in the case of distant supervision, existing structured
sources of the same schema. The need for existing datasets can be avoided
by using a universal schema: the union of all involved schemas (surface form
predicates as in OpenIE, and relations in the schemas of preexisting
databases). This schema has an almost unlimited set of relations (due to
surface forms), and supports integration with existing structured data
(through the relation types of existing databases). To populate a database of
such schema we present matrix factorization models that learn latent feature
vectors for entity tuples and relations. We show that such latent models
achieve substantially higher accuracy than a traditional classification
approach. More importantly, by operating simultaneously on relations
observed in text and in pre-existing structured DBs such as Freebase, we are
able to reason about unstructured and structured data in mutually-
supporting ways."
- `video` <http://techtalks.tv/talks/relation-extraction-with-matrix-
factorization-and-universal-schemas/58435/> (Riedel)
`Universal Schema`
> "In this paper we explore a row-less extension of universal schema that
forgoes explicit row representations for an aggregation function over its
observed columns. This extension allows prediction between all rows in new
textual mentions - whether seen at train time or not - and also provides a
natural connection to the provenance supporting the prediction. Our models
also have a smaller memory footprint. In this work we show that an
aggregation function based on query-specific attention over relation types
outperforms query independent aggregations. We show that aggregation
models are able to predict on par with models with explicit row
representations on seen row entries."
- `video` <http://www.fields.utoronto.ca/video-archive/2016/11/2267-
16181> (30:45) (McCallum)
- `slides` <http://akbc.ws/2016/slides/verga-akbc16.pdf>
#### ["Chains of Reasoning over Entities, Relations, and Text using
Recurrent Neural Networks"](http://arxiv.org/abs/1607.01426) Das,
Neelakantan, Belanger, McCallum
`Universal Schema`
> Our goal is to combine the rich multistep inference of symbolic logical
reasoning with the generalization capabilities of neural networks. We are
particularly interested in complex reasoning about entities and relations in
text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use
RNNs to compose the distributed semantics of multi-hop paths in KBs;
however for multiple reasons, the approach lacks accuracy and practicality.
This paper proposes three significant modeling advances: (1) we learn to
jointly reason about relations, entities, and entity-types; (2) we use neural
attention modeling to incorporate multiple paths; (3) we learn to share
strength in a single RNN that represents logical composition across all
relations. On a largescale Freebase+ClueWeb prediction task, we achieve
25% error reduction, and a 53% error reduction on sparse relations due to
shared strength. On chains of reasoning in WordNet we reduce error in mean
quantile by 84% versus previous state-of-the-art."
- `video`
<http://videolectures.net/deeplearning2016_das_neural_networks/> (Das)
- `video` <http://www.fields.utoronto.ca/video-archive/2016/11/2267-
16181> (33:56) (McCallum)
- `code` <https://github.com/rajarshd/ChainsofReasoning>
#### ["Go for a Walk and Arrive at the Answer: Reasoning Over Paths in
Knowledge Bases using Reinforcement
Learning"](https://arxiv.org/abs/1711.05851) Das, Dhuliawala, Zaheer, Vilnis,
Durugkar, Krishnamurthy, Smola, McCallum
- `code` <https://github.com/shehzaadzd/MINERVA>
`RESCAL`
> "RESCAL models each entity with a vector, and each relation with a
matrix, and computes the probability of an entity pair belonging to a relation
by multiplying the relation matrix on either side with the entity vectors."
- `video` <http://videolectures.net/eswc2014_tresp_machine_learning/>
(Tresp) ([slides](http://www.dbs.ifi.lmu.de/~tresp/papers/ESWC-Keynote.pdf))
- `code` <https://github.com/mnick/scikit-tensor>
- `code` <https://github.com/mnick/rescal.py>
- `code` <https://github.com/nzhiltsov/Ext-RESCAL>
([post](http://nzhiltsov.blogspot.ru/2014/10/ext-rescal-tensor-
factorization.html))
`type-constrained RESCAL`
`embedding of logic`
- `poster` <http://sameersingh.org/files/papers/lowranklogic-starai14-
poster.pdf>
- `video` <http://techtalks.tv/talks/injecting-logical-background-knowledge-
into-embeddings-for-relation-extraction/61526/> (Rocktaschel)
- `slides` <http://yoavartzi.com/sp14/slides/rockt.sp14.pdf>
- `code` <https://github.com/uclmr/low-rank-logic>
`embedding of logic`
> "For a long time, practitioners have been reluctant to use embedding
models because many common relationships, including the sameAs relation
modeled as an identity matrix, were not trivially seen as low-rank. In this
paper we showed that when sign-rank based binary loss is minimized, many
common relations such as permutation matrices, sequential relationships,
and transitivity can be represented by surprisingly small embeddings."
- `slides`
<https://drive.google.com/file/d/0B_hicYJxvbiObnNYZ0cyMkotUzQ>
`embedding of logic`
Since such common sense formulae are often not directly observed in distant
supervision, they can go a long way in fixing common extraction errors.
Finally, we will investigate methods to automatically mine commonsense
knowledge for injection into embeddings from additional resources such as
Probase or directly from text using a semantic parser."
> "We propose to inject formulae into the embeddings of relations and
entity-pairs, i.e., estimate the embeddings such that predictions based on
them conform to given logic formulae. We refer to such embeddings as low-
rank logic embeddings. Akin to matrix factorization, inference of a fact at
test time still amounts to an efficient dot product of the corresponding
relation and entity-pair embeddings, and logical inference is not needed. We
present two techniques for injecting logical background knowledge, pre-
factorization inference and joint optimization, and demonstrate in
subsequent sections that they generalize better than direct logical inference,
even if such inference is performed on the predictions of the matrix
factorization model."
> "The intuition is that the additional training data generated by the
formulae provide evidence of the logical dependencies between relations to
the matrix factorization model, while at the same time allowing the
factorization to generalize to unobserved facts and to deal with ambiguity
and noise in the data. No further logical inference is performed during or
after training of the factorization model as we expect that the learned
embeddings encode the given formulae. One drawback of pre-factorization
inference is that the formulae are enforced only on observed atoms, i.e.,
first-order dependencies on predicted facts are ignored. Instead we would
like to include a loss term for the logical formulae directly in the matrix
factorization objective, thus jointly optimizing embeddings to reconstruct
factual training data as well as obeying to first-order logical background
knowledge."
> "Even if the embeddings could enable perfect logical reasoning, how
do we provide provenance or proofs of answers? Moreover, in practice a
machine reader (e.g. a semantic parser) incrementally gathers logical
statements from text - how could we incrementally inject this knowledge into
embeddings without retraining the whole model? Finally, what are the
theoretical limits of embedding logic in vector spaces?"
>
>
> Formulae can be injected into embeddings
- `video` <http://techtalks.tv/talks/injecting-logical-background-knowledge-
into-embeddings-for-relation-extraction/61526/> (Rocktaschel)
- `code` <https://github.com/uclmr/low-rank-logic>
`embedding of logic`
> "Finally, we explicitly discuss the main differences with respect to the
strongly related work from Rocktaschel et al. (2015). Their method is more
general, as they cover a wide range of first-order logic rules, whereas we
only discuss implications. Lifted rule injection beyond implications will be
studied in future research contributions. However, albeit less general, our
model has a number of clear advantages:
> Scalability - Our proposed model of lifted rule injection scales according
to the number of implication rules, instead of the number of rules times the
number of observed facts for every relation present in a rule.
> Generalizability - Injected implications will hold even for facts not seen
during training, because their validity only depends on the order relation
imposed on the relation representations. This is not guaranteed when
training on rules grounded in training facts by Rocktaschel et al. (2015).
> Training Flexibility - Our method can be trained with various loss
functions, including the rank-based loss as used in Riedel et al. (2013). This
was not possible for the model of Rocktaschel et al. (2015) and already leads
to an improved accuracy.
- `video` <http://techtalks.tv/talks/lifted-rule-injection-for-relation-
embeddings/63332/> (Demeester)
---
[**interesting recent
papers**](https://github.com/brylevkirill/notes/blob/master/interesting
%20recent%20papers.md#reasoning)
----
> "Large pre-trained language models have been shown to store factual
knowledge in their parameters, and achieve state-of-the-art results when
fine-tuned on downstream NLP tasks. However, their ability to access and
precisely manipulate knowledge is still limited, and hence on knowledge-
intensive tasks, their performance lags behind task-specific architectures.
Additionally, providing provenance for their decisions and updating their
world knowledge remain open research problems. Pre-trained models with a
differentiable access mechanism to explicit non-parametric memory can
overcome this issue, but have so far been only investigated for extractive
downstream tasks. We explore a general-purpose fine-tuning recipe for
retrieval-augmented generation (RAG) — models which combine pre-trained
parametric and non-parametric memory for language generation. We
introduce RAG models where the parametric memory is a pre-trained
seq2seq model and the non-parametric memory is a dense vector index of
Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG
formulations, one which conditions on the same retrieved passages across
the whole generated sequence, the other can use different passages per
token. We fine-tune and evaluate our models on a wide range of knowledge-
intensive NLP tasks and set the state-of-the-art on three open domain QA
tasks, outperforming parametric seq2seq models and task-specific retrieve-
and-extract architectures. For language generation tasks, we find that RAG
models generate more specific, diverse and factual language than a state-of-
the-art parametric-only seq2seq baseline."
`REALM`
- `post` <https://ai.googleblog.com/2020/08/realm-integrating-retrieval-
into.html>
- `post` <https://joeddav.github.io/blog/2020/03/03/REALM.html>
- `paper` ["How Much Knowledge Can You Pack Into the Parameters of a
Language Model?"](https://craffel.github.io/publications/arxiv2020how.pdf)
by Roberts, Raffel, Shazeer
#### ["No Need to Pay Attention: Simple Recurrent Neural Networks Work!
(for Answering 'Simple' Questions)"](http://arxiv.org/abs/1606.05029) Ture,
Jojic
`relation classification`
> "First-order factoid question answering assumes that the question can
be answered by a single fact in a knowledge base (KB). While this does not
seem like a challenging task, many recent attempts that apply either
complex linguistic reasoning or deep neural networks achieve 65%-76%
accuracy on benchmark sets. Our approach formulates the task as two
machine learning problems: detecting the entities in the question, and
classifying the question as one of the relation types in the KB. We train a
recurrent neural network to solve each problem. On the SimpleQuestions
dataset, our approach yields substantial improvements over previously
published results --- even neural networks based on much more complex
architectures. The simplicity of our approach also has practical advantages,
such as efficiency and modularity, that are valuable especially in an industry
setting. In fact, we present a preliminary analysis of the performance of our
model on real queries from Comcast's X1 entertainment platform with
millions of users every day."
#### ["Question Answering with Subgraph
Embeddings"](http://emnlp2014.org/papers/pdf/EMNLP2014067.pdf) Bordes,
Chopra, Weston
`entity embedding`
> "It did as well as the best previous methods on the WebQuestions
dataset, yet doesn't use parsers (semantic and/or syntactic) or logic
reasoning engines. All it does is some arithmetic over vectors formed from
words and relations from both a knowledge base and the question; it finds an
optimal "embedding matrix" W and it does some matrix multiplication to
score question-answer pairs. One limitation is that it doesn't care about word
order in the question -- it's basically a "bag of words" setup. Another is that it
limits the complexity of the reasoning quite a lot. Another, still, is that it can
only spit out entities and paths to get to the answer -- it can't return whole
paragraphs, for instance."
- `notes` <http://www.shortscience.org/paper?bibtexKey=journals%2Fcorr
%2F1406.3676>
`program embedding`
- <https://github.com/brylevkirill/notes/blob/master/Deep
%20Learning.md#neural-programmer-inducing-latent-programs-with-
gradient-descent-neelakantan-le-sutskever>
- `video` <http://research.microsoft.com/apps/video/default.aspx?
id=260024> (10:45) (Darrell)
- `post` <http://blog.jacobandreas.net/programming-with-nns.html>
- `notes` <https://github.com/abhshkdz/papers/blob/master/reviews/deep-
compositional-question-answering-with-neural-module-networks.md>
- `code` <http://github.com/jacobandreas/nmn2>
#### ["Learning to Compose Neural Networks for Question Answering"]
(http://arxiv.org/abs/1601.01705) Andreas, Rohrbach, Darrell, Klein
> "We describe a question answering model that applies to both images
and structured knowledge bases. The model uses natural language strings to
automatically assemble neural networks from a collection of composable
modules. Parameters for these modules are learned jointly with network-
assembly parameters via reinforcement learning, with only (world, question,
answer) triples as supervision. Our approach, which we term a dynamic
neural model network, achieves state-of-the-art results on benchmark
datasets in both visual and structured domains."
- `post` <http://blog.jacobandreas.net/programming-with-nns.html>
- `code` <http://github.com/jacobandreas/nmn2>
- <http://ronghanghu.com/n2nmn/>
- `post` <http://bair.berkeley.edu/blog/2017/06/20/learning-to-reason-with-
neural-module-networks>
- `code` <https://github.com/ronghanghu/n2nmn>
- `code`
<https://github.com/tensorflow/models/tree/master/research/qa_kg>
`entity embedding`
`entity embedding`
> "We have two key empirical findings: First, we show that compositional
training enables us to answer path queries up to at least length 5 by
substantially reducing cascading errors present in the base vector space
model. Second, we find that somewhat surprisingly, compositional training
also improves upon state-of-the-art performance for knowledge base
completion, which is a special case of answering unit length path queries.
Therefore, compositional training can also be seen as a new form of
structural regularization for existing models."
- `notes`
<https://codalab.org/worksheets/0xfcace41fdeec45f3bc6ddf31107b829f>
> "In this work, we approach the problem of semantic parsing from a
paraphrasing viewpoint. A fundamental motivation and long standing goal of
the paraphrasing and RTE communities has been to cast various semantic
applications as paraphrasing/textual entailment."
- `code` <http://www-nlp.stanford.edu/software/sempre/>
#### ["Building a Semantic Parser
Overnight"](https://nlp.stanford.edu/pubs/wang-berant-liang-acl2015.pdf)
Wang, Berant, Liang
- `code` <http://www-nlp.stanford.edu/software/sempre/>
> "In this paper, we introduce a new semantic parsing approach for
Freebase. A key idea in our work is to exploit the structural and conceptual
similarities between natural language and Freebase through a common
graph-based representation. We formalize semantic parsing as a graph
matching problem and learn a semantic parser without using annotated
question-answer pairs. We have shown how to obtain graph representations
from the output of a CCG parser and subsequently learn their
correspondence to Freebase using a rich feature set and their denotations as
a form of weak supervision. Our parser yields state-of-the art performance on
three large Freebase domains and is not limited to question answering. We
can create semantic parses for any type of NL sentences. Our work brings
together several strands of research. Graph-based representations of
sentential meaning have recently gained some attention in the literature
(Banarescu'2013), and attempts to map sentences to semantic graphs have
met with good inter-annotator agreement. Our work is also closely related to
Kwiatkowski'2013, Berant and Liang'2014 who present open-domain
semantic parsers based on Freebase and trained on QA pairs. Despite
differences in formulation and model structure, both approaches have
explicit mechanisms for handling the mismatch between natural language
and the KB (e.g., using logical-type equivalent operators or paraphrases).
The mismatch is handled implicitly in our case via our graphical
representation which allows for the incorporation of all manner of powerful
features. More generally, our method is based on the assumption that
linguistic structure has a correspondence to Freebase structure which does
not always hold (e.g., in Who is the grandmother of Prince William?,
grandmother is not directly expressed as a relation in Freebase). Additionally,
our model fails when questions are too short without any lexical clues (e.g.,
What did Charles Darwin do?). Supervision from annotated data or
paraphrasing could improve performance in such cases. In the future, we
plan to explore cluster-based semantics (Lewis and Steedman'2013) to
increase the robustness on unseen NL predicates. Our work joins others in
exploiting the connections between natural language and open-domain
knowledge bases. Recent approaches in relation extraction use distant
supervision from a knowledge base to predict grounded relations between
two target entities. During learning, they aggregate sentences containing the
target entities, ignoring richer contextual information. In contrast, we learn
from each individual sentence taking into account all entities present, their
relations, and how they interact. Krishnamurthy and Mitchell'2012 formalize
semantic parsing as a distantly supervised relation extraction problem
combined with a manually specified grammar to guide semantic parse
composition. Finally, our approach learns a model of semantics guided by
denotations as a form of weak supervision."
- `video` <http://techtalks.tv/talks/large-scale-semantic-parsing-without-
question-answer-pairs/61530/> (Reddy)
> - Identify possible entities in the question (e.g., Meg, Family Guy)
> DSSM measures the semantic matching between Pattern and Relation."
- `slides` <http://research.microsoft.com/pubs/244749/ACL-15-
STAGG_deck.pptx>
> "We develop a simple log-linear model, in the spirit of traditional web-
based QA systems, that answers questions by querying the web and
extracting the answer from returned web snippets. Thus, our evaluation
scheme is suitable for semantic parsing benchmarks in which the knowledge
required for answering questions is covered by the web (in contrast with
virtual assitants for which the knowledge is specific to an application)."
---
[**interesting recent
papers**](https://github.com/brylevkirill/notes/blob/master/interesting
%20recent%20papers.md#reasoning)
----
- <https://github.com/brylevkirill/notes/blob/master/Natural%20Language
%20Processing.md#language-models-are-unsupervised-multitask-learners-
radford-wu-child-luan-amodei-sutskever>
`unsupervised learning`
- `post` <https://ai.facebook.com/blog/research-in-brief-unsupervised-
question-answering-by-cloze-translation>
#### ["A Neural Network for Factoid Question Answering over Paragraphs"]
(http://cs.umd.edu/~miyyer/qblearn/) Iyyer, Boyd-Graber, Claudino, Socher,
Daume
`QANTA`
> "We use a specially designed dataset that challenges humans: a trivia
game called Quiz bowl. These questions are written so that they can be
interrupted by someone who knows more about the answer; that is, harder
clues are at the start of the question and easier clues are at the end of the
question. The content model produces guesses of what the answer could be
and the policy must decide when to accept the guess.
Quiz bowl is a fun game with excellent opportunities for outreach, but it is
also related to core challenges in natural language processing: classification
(sorting inputs and making predictions), discourse (using pragmatic clues to
guess what will come next), and coreference resolution (knowing which
entities are discussed from oblique mentions)."
- <http://qanta.org>
- <http://umiacs.umd.edu/~jbg/projects/IIS-1320538.html>
- `video` <http://youtube.com/watch?v=LqsUaprYMOw> +
<http://youtube.com/watch?v=-jbqiXvmY9w> (exhibition match against
team of Jeopardy champions)
- `video` <http://youtube.com/watch?v=kTXJCEvCDYk> (exhibition match
against Ken Jennings)
- `video`
<http://videolectures.net/deeplearning2015_socher_nlp_applications/
#t=540> (Socher)
- `video` <http://youtube.com/watch?v=hqGU-6ZPQzw> +
<http://youtube.com/watch?v=OhK5dY_W4Jc> (Boyd-Graber)
- `audio` <https://soundcloud.com/nlp-highlights/72-the-anatomy-question-
answering-task-with-jordan-boyd-graber> (Boyd-Graber)
- `poster` <http://emnlp2014.org/material/poster-EMNLP2014070.pdf>
(technical overview)
- `code` <https://github.com/Pinafore/qb>
- `code` <https://github.com/miyyer/qb>
- `code` <https://github.com/jcoreyes/NLQA/tree/master/qanta>
#### ["Key-Value Memory Networks for Directly Reading Documents"]
(https://arxiv.org/abs/1606.03126) Miller, Fisch, Dodge, Karimi, Bordes,
Weston
> "Directly reading documents and being able to answer questions from
them is a key problem. To avoid its inherent difficulty, question answering
has been directed towards using Knowledge Bases instead, which has proven
effective. Unfortunately KBs suffer from often being too restrictive, as the
schema cannot support certain types of answers, and too sparse, e.g.
Wikipedia contains much more information than Freebase. In this work we
introduce a new method, Key-Value Memory Networks, that makes reading
documents more viable by utilizing different encodings in the addressing and
output stages of the memory read operation. To compare using KBs,
information extraction or Wikipedia documents directly in a single framework
we construct an analysis tool, MOVIEQA, a QA dataset in the domain of
movies. Our method closes the gap between all three settings. It also
achieves state-of-the-art results on the existing WIKIQA benchmark."
- `video` <http://techtalks.tv/talks/key-value-memory-networks-for-directly-
reading-documents/63333/> (Miller)
- `video`
<http://videolectures.net/deeplearning2016_chopra_attention_memory/
#t=4038>
- `video` <https://youtu.be/x1kf4Zojtb0?t=25m46s> (de Freitas)
- `notes`
<http://www.shortscience.org/paper?bibtexKey=journals/corr/1606.03126>
- `notes`
<https://gist.github.com/shagunsodhani/a5e0baa075b4a917c0a69edc57577
2a8>
> "This paper proposes to tackle open- domain question answering using
Wikipedia as the unique knowledge source: the answer to any factoid
question is a text span in a Wikipedia article. This task of machine reading at
scale combines the challenges of document retrieval (finding the relevant
articles) with that of machine comprehension of text (identifying the answer
spans from those articles). Our approach combines a search component
based on bigram hashing and TF-IDF matching with a multi-layer recurrent
neural network model trained to detect answers in Wikipedia paragraphs.
Our experiments on multiple existing QA datasets indicate that (1) both
modules are highly competitive with respect to existing counterparts and (2)
multitask learning using distant supervision on their combination is an
effective complete system on this challenging task."
- `code`
<https://github.com/facebookresearch/ParlAI/tree/master/parlai/agents/
drqa>
- `code` <https://github.com/hitvoice/DrQA>
#### ["Text Understanding with the Attention Sum Reader Network"]
(http://arxiv.org/abs/1603.01547) Kadlec, Schmid, Bajgar, Kleindienst
> "The words from the document and the question are first converted
into vector embeddings using a look-up matrix V. The document is then read
by a bidirectional GRU network. A concatenation of the hidden states of the
forward and backward GRUs at each word is then used as a contextual
embedding of this word, intuitively representing the context in which the
word is appearing. We can also understand it as representing the set of
questions to which this word may be an answer. Similarly the question is
read by a bidirectional GRU but in this case only the final hidden states are
concatenated to form the question embedding. The attention over each word
in the context is then calculated as the dot product of its contextual
embedding with the question embedding. This attention is then normalized
by the softmax function. While most previous models used this attention as
weights to calculate a blended representation of the answer word, we simply
sum the attention across all occurrences of each unique words and then
simply select the word with the highest sum as the final answer. While
simple, this trick seems both to improve accuracy and to speed-up training."
> "We have shown that simply infusing a model with more data can yield
performance improvements of up to 14.8% where several attempts to
improve the model architecture on the same training data have given gains
of at most 2.1% compared to our best ensemble result."
> "If we move model training from joint CBT NE+CN training data to a
subset of the BookTest of the same size (230k examples), we see a drop in
accuracy of around 10% on the CBT test datasets. Hence even though the
Children’s Book Test and BookTest datasets are almost as close as two
disjoint datasets can get, the transfer is still very imperfect. This also
suggests that the increase in accuracy when using more data that are strictly
in the same domain as the original training data results in a performance
increase even larger that the one we are reporting on CBT. However the
scenario of having to look for additional data elsewhere is more realistic."
- `code` <https://github.com/cairoHy/attention-sum-reader>
- `paper` ["Embracing data abundance: BookTest Dataset for Reading
Comprehension"](https://arxiv.org/abs/1610.00956) by Bajgar, Kadlec,
Kleindienst
> "Open domain Question Answering systems must interact with external
knowledge sources, such as web pages, to find relevant information.
Information sources like Wikipedia, however, are not well structured and
difficult to utilize in comparison with Knowledge Bases. In this work we
present a two-step approach to question answering from unstructured text,
consisting of a retrieval step and a comprehension step. For comprehension,
we present an RNN based attention model with a novel mixture mechanism
for selecting answers from either retrieved articles or a fixed vocabulary. For
retrieval we introduce a hand-crafted model and a neural model for ranking
relevant articles. We achieve state-of-the-art performance on WIKIMOVIES
dataset, reducing the error by 40%. Our experimental results further
demonstrate the importance of each of the introduced components."
- <https://theneuralperspective.com/2017/04/26/question-answering-from-
unstructured-text-by-retrieval-and-comprehension/>
- <https://soundcloud.com/nlp-highlights/13a>
`BiDAF`
- `code`
<https://github.com/allenai/allennlp/tree/master/allennlp/models/reading_co
mprehension>
- `post` <https://ai.googleblog.com/2018/10/open-sourcing-active-
question.html>
- `video` <https://facebook.com/iclr.cc/videos/2125495797479475?
t=3836> (Bulian, Houlsby)
- `code` <https://github.com/google/active-qa>
- `code` <https://github.com/nyu-dl/QueryReformulator>
---
[**interesting recent
papers**](https://github.com/brylevkirill/notes/blob/master/interesting
%20recent%20papers.md#reasoning)
----
> "Aim:
- `video`
<https://facebook.com/nipsfoundation/videos/1554402331317667?t=1731>
- `poster` <https://rockt.github.io/pdf/rocktaschel2017end-poster.pdf>
- `slides` <https://rockt.github.io/pdf/rocktaschel2017end-slides.pdf>
(Rocktaschel)
- `slides` <http://aitp-conference.org/2017/slides/Tim_aitp.pdf>
(Rocktaschel)
- `slides`
<http://on-demand.gputechconf.com/gtc-eu/2017/presentation/23372-tim-
rocktäschel-gpu-accelerated-deep-neural-networks-for-end-to-end-
differentiable-planning-and-reasoning.pdf> (Rocktaschel)
- `code` <https://github.com/uclmr/ntp>
> "In this paper, we show how the state-of-the-art in recognizing textual
entailment on a large, human-curated and annotated corpus, can be
improved with general end-to-end differentiable models. Our results
demonstrate that LSTM recurrent neural networks that read pairs of
sequences to produce a final representation from which a simple classifier
predicts entailment, outperform both a neural baseline as well as a classifier
with hand-engineered features. Furthermore, extending these models with
attention over the premise provides further improvements to the predictive
abilities of the system, resulting in a new state-of-the-art accuracy for
recognizing entailment on the Stanford Natural Language Inference corpus.
The models presented here are general sequence models, requiring no
appeal to natural language specific processing beyond tokenization, and are
therefore a suitable target for transfer learning through pre-training the
recurrent systems on other corpora, and conversely, applying the models
trained on this corpus to other entailment tasks. Future work will focus on
such transfer learning tasks, as well as scaling the methods presented here
to larger units of text (e.g. paragraphs and entire documents) using
hierarchical attention mechanisms. Furthermore, we aim to investigate the
application of these generic models to non-natural language sequential
entailment problems."
- `slides` <http://egrefen.com/docs/HowMuchLinguistics2015.pdf>
- `code` <https://github.com/borelien/entailment-neural-attention-lstm-tf>
> "NLP tasks differ in the semantic information they require, and at this
time no single semantic representation fulfills all requirements. Logic-based
representations characterize sentence structure, but do not capture the
graded aspect of meaning. Distributional models give graded similarity
ratings for words and phrases, but do not adequately capture overall
sentence structure. So it has been argued that the two are complementary.
In this paper, we adopt a hybrid approach that combines logic-based and
distributional semantics through probabilistic logic inference in Markov Logic
Networks. We focus on textual entailment, a task that can utilize the
strengths of both representations. Our system is three components, 1)
parsing and task representation, where input RTE problems are represented
in probabilistic logic. This is quite different from representing them in
standard first-order logic. 2) knowledge base construction in the form of
weighted inference rules from different sources like WordNet, paraphrase
collections, and lexical and phrasal distributional rules generated on the fly.
We use a variant of Robinson resolution to determine the necessary
inference rules. More sources can easily be added by mapping them to
logical rules; our system learns a resource-specific weight that counteract
scaling differences between resources. 3) inference, where we show how to
solve the inference problems efficiently. In this paper we focus on the SICK
dataset, and we achieve a state-of-the-art result. Our system handles overall
sentence structure and phenomena like negation in the logic, then uses our
Robinson resolution variant to query distributional systems about words and
short phrases. Therefore, we use our system to evaluate distributional lexical
entailment approaches. We also publish the set of rules queried from the
SICK dataset, which can be a good resource to evaluate them."
> "In the Inference step, automated reasoning for MLNs is used to
perform the RTE task. We implement an MLN inference algorithm that
directly supports querying complex logical formula, which is not supported in
the available MLN tools. We exploit the closed-world assumption to help
reduce the size of the inference problem in order to make it tractable."
----
> "Previous work has examined the use of Markov Logic Networks to
represent the requisite background knowledge and interpret test questions,
but did not improve upon an information retrieval baseline. In this paper, we
describe an alternative approach that operates at three levels of
representation and reasoning: information retrieval, corpus statistics, and
simple inference over a semi-automatically constructed knowledge base, to
achieve substantially improved results. We evaluate the methods on six
years of unseen, unedited exam questions from the NY Regents Science
Exam (using only non-diagram, multiple choice questions), and show that our
overall system’s score is 71.3%, an improvement of 23.8% (absolute) over
the MLN-based method described in previous work."
- `slides` <https://github.com/clulab/nlp-reading-group/raw/master/fall-
2015-resources/Markov%20Logic%20Networks%20for%20Natural
%20Language%20Question%20Answering.pdf>
> "We treat question answering as the task of pairing the question with
an answer such that this pair has the best support in the knowledge base,
measured in terms of the strength of a “support graph”. Informally, an edge
denotes (soft) equality between a question or answer node and a table node,
or between two table nodes. To account for lexical variability (e.g., that tool
and instrument are essentially equivalent) and generalization (e.g., that a
dog is an animal), we replace string equality with a phrase-level entailment
or similarity function. A support graph thus connects the question
constituents to a unique answer option through table cells and (optionally)
table headers corresponding to the aligned cells. A given question and tables
give rise to a large number of possible support graphs, and the role of the
inference process will be to choose the “best” one under a notion of
desirable support graphs developed next. We do this through a number of
additional structural and semantic properties; the more properties the
support graph satisfies, the more desirable it is."
- `code` <https://github.com/allenai/tableilp>
> "We propose a novel method for exploiting the semantic structure of
text to answer multiple-choice questions. The approach is especially suitable
for domains that require reasoning over a diverse set of linguistic constructs
but have limited training data. To address these challenges, we present the
first system, to the best of our knowledge, that reasons over a wide range of
semantic abstractions of the text, which are derived using off-the-shelf,
general-purpose, pre-trained natural language modules such as semantic
role labelers, coreference resolvers, and dependency parsers. Representing
multiple abstractions as a family of graphs, we translate question answering
into a search for an optimal subgraph that satisfies certain global and local
properties. This formulation generalizes several prior structured QA systems.
Our system, SemanticILP, demonstrates strong performance on two domains
simultaneously. In particular, on a collection of challenging science QA
datasets, it outperforms various state-ofthe-art approaches, including neural
models, broad coverage information retrieval, and specialized techniques
using structured knowledge bases, by 2%-6%."
- `code` <https://github.com/allenai/semanticilp>
---
[**selected papers**](https://yadi.sk/d/5WLsH_nd3ZUJU4)
----
- <https://github.com/brylevkirill/notes/blob/master/Knowledge
%20Representation%20and%20Reasoning.md#alexandria-unsupervised-
high-precision-knowledge-base-construction-using-a-probabilistic-program-
winn-et-al>
> "The task of data fusion is to identify the true values of data items
(e.g., the true date of birth for Tom Cruise) among multiple observed values
drawn from different sources (e.g., Web sites) of varying (and unknown)
reliability. A recent survey has provided a detailed comparison of various
fusion methods on Deep Web data. In this paper, we study the applicability
and limitations of different fusion techniques on a more challenging problem:
knowledge fusion. Knowledge fusion identifies true subject-predicate-object
triples extracted by multiple information extractors from multiple information
sources. These extractors perform the tasks of entity linkage and schema
alignment, thus introducing an additional source of noise that is quite
different from that traditionally considered in the data fusion literature, which
only focuses on factual errors in the original sources. We adapt state-of-the-
art data fusion techniques and apply them to a knowledge base with 1.6B
unique knowledge triples extracted by 12 extractors from over 1B Web
pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods."
- `slides` <http://lunadong.com/talks/fromDFtoKF.pdf>
these knowledge bases are greatly incomplete. For example, over 70% of
people included in Freebase have no known place of birth, and 99% have no
known ethnicity. In this paper, we propose a way to leverage existing Web-
search–based question-answering technology to fill in the gaps in knowledge
bases in a targeted way. In particular, for each entity attribute, we learn the
best set of queries to ask, such that the answer snippets returned by the
search engine are most likely to contain the correct value for that attribute.
For example, if we want to find Frank Zappa’s mother, we could ask the
query who is the mother of Frank Zappa. However, this is likely to return ‘The
Mothers of Invention’, which was the name of his band. Our system learns
that it should (in this case) add disambiguating terms, such as Zappa’s place
of birth, in order to make it more likely that the search results contain
snippets mentioning his mother. Our system also learns how many different
queries to ask for each attribute, since in some cases, asking too many can
hurt accuracy (by introducing false positives). We discuss how to aggregate
candidate answers across multiple queries, ultimately returning probabilistic
predictions for possible values for each attribute. Finally, we evaluate our
system and show that it is able to extract a large number of facts with high
confidence."
> "We show that relation extraction can be reduced to answering simple
reading comprehension questions, by associating one or more natural-
language questions with each relation slot. This reduction has several
advantages: we can (1) learn relation extraction models by extending recent
neural reading-comprehension techniques, (2) build very large training sets
for those models by combining relation-specific crowd-sourced questions
with distant supervision, and even (3) do zero-shot learning by extracting
new relation types that are only specified at test-time, for which we have no
labeled training examples. Experiments on a Wikipedia slot-filling task
demonstrate that the approach can generalize to new questions for known
relation types with high accuracy, and that zero-shot generalization to
unseen relation types is possible, at lower accuracy levels, setting the bar for
future work on this task."
> "We describe Stanford’s entry in the TACKBP 2014 Slot Filling
challenge. We submitted two broad approaches to Slot Filling: one based on
the DeepDive framework (Niu et al., 2012), and another based on the multi-
instance multi-label relation extractor of Surdeanu et al. (2012). In addition,
we evaluate the impact of learned and hard-coded patterns on performance
for slot filling, and the impact of the partial annotations described in Angeli
et al. (2014)."
> "We describe Stanford’s two systems in the 2014 KBP Slot Filling
competition. The first, and best performing system, is built on top of the
DeepDive framework. The central lesson we would like to emphasize from
this system is that leveraging large computers allows for completely
removing the information retrieval component of a traditional KBP system,
and allows for quick turnaround times while processing the entire source
corpus as a single unit. DeepDive offers a convenient framework for
developing systems on these large computers, including defining the pre-
processing pipelines (feature engineering, entity linking, mention detection,
etc.) and then defining and training a relation extraction model. The second
system Stanford submitted is based around the MIML-RE relation extractor,
following closely from the 2013 submission, but with the addition of learned
patterns, and with MIML-RE trained fixing carefully selected manually
annotated sentences. The central lesson we would like to emphasize from
this system is that a relatively small annotation effort (10k sentences) over
carefully selected examples can yield a surprisingly large gain in end-to-end
performance on the Slot Filling task."
> "In DeepDive, calibration plots are used to summarize the overall
quality of the results. Because DeepDive uses a joint probability model, each
random variable is assigned a marginal probability. Ideally, if one takes all
the facts to which DeepDive assigns a probability score of 0.95, then 95% of
these facts are correct. We believe that probabilities remove a key element:
the developer reasons about features, not the algorithms underneath. This is
a type of algorithm independence that we believe is critical."
- `code` <https://github.com/karthikncode/DeepRL-InformationExtraction>
> "Search engines make significant efforts to recognize queries that can
be answered by structured data and invest heavily in creating and
maintaining high-precision databases. While these databases have a
relatively wide coverage of entities, the number of attributes they model
(e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of
attributes known to the search engine can enable it to more precisely answer
queries from the long and heavy tail, extract a broader range of facts from
the Web, and recover the semantics of tables on the Web. We describe
Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct
attribute names. Biperpedia extracts attributes from the query stream, and
then uses the best extractions to seed attribute extraction from text. For
every attribute Biperpedia saves a set of synonyms and text patterns in
which it appears, thereby enabling it to recognize the attribute in more
contexts. In addition to a detailed analysis of the quality of Biperpedia, we
show that it can increase the number of Web tables whose semantics we can
recover by more than a factor of 4 compared with Freebase."
> "In this paper we introduce PIDGIN, a novel, flexible, and scalable
approach to automatic alignment of real-world KB ontologies, demonstrating
its superior performance at aligning large real-world KB ontologies including
those of NELL, Yago and Freebase. The key idea in PIDGIN is to align KB
ontologies by integrating two types of information: relation instances that are
shared by the two KBs, and mentions of the KB relation instances across a
large text corpus. PIDGIN uses a natural language web text corpus of 500
million dependency-parsed documents as interlingua and a graph-based self-
supervised learning to infer alignments. To the best of our knowledge, this is
the first successful demonstration of using such a large text resource for
ontology alignment. PIDGIN is self-supervised, and does not require human
labeled data. Moreover, PIDGIN can be implemented in MapReduce, making
it suitable for aligning ontologies from large KBs. We have provided extensive
experimental results on multiple real world datasets, demonstrating that
PIDGIN significantly outperforms PARIS, the current state-of-the-art approach
to ontology alignment. We observe in particular that PIDGIN is typically able
to improve recall over that of PARIS, without degradation in precision. This is
presumably due to PIDGIN’s ability to use text-based interlingua to establish
alignments when there are few or no relation instances shared by the two
KBs. Additionally, PIDGIN automatically learns which verbs are associated
with which ontology relations. These verbs can be used in the future to
extract new instances to populate the KB or identify relations between
entities in documents. PIDGIN can also assign relations in one KB with
argument types of another KB. This can help type relations that do not yet
have argument types, like that of KBP. Argument typing can improve the
accuracy of extraction of new relation instances by constraining the
instances to have the correct types. In the future, we plan to extend PIDGIN’s
capabilities to provide explanations for its inferred alignments. We also plan
to experiment with aligning ontologies from more than two KBs
simultaneously."
- `code` <https://github.com/kushalarora/pidgin>
> "We consider the question of how unlabeled data can be used to
estimate the true accuracy of learned classifiers. This is an important
question for any autonomous learning system that must estimate its
accuracy without supervision, and also when classifiers trained from one
data distribution must be applied to a new distribution (e.g., document
classifiers trained on one text corpus are to be applied to a second corpus).
We first show how to estimate error rates exactly from unlabeled data when
given a collection of competing classifiers that make independent errors,
based on the agreement rates between subsets of these classifiers. We
further show that even when the competing classifiers do not make
independent errors, both their accuracies and error dependencies can be
estimated by making certain relaxed assumptions. Experiments on two real-
world data sets produce estimates within a few percent of the true accuracy,
using solely unlabeled data. These results are of practical significance in
situations where labeled data is scarce and shed light on the more general
question of how the consistency among multiple functions is related to their
true accuracies."
- `code` <https://github.com/eaplatanios/makina>
`truth finding`
> "The quality of web sources has been traditionally evaluated using
exogenous signals such as the hyperlink structure of the graph. We propose
a new approach that relies on endogenous signals, namely, the correctness
of factual information provided by the source. A source that has few false
facts is considered to be trustworthy. The facts are automatically extracted
from each source by information extraction methods commonly used to
construct knowledge bases. We propose a way to distinguish errors made in
the extraction process from factual errors in the web source per se, by using
joint inference in a novel multi-layer probabilistic model. We call the
trustworthiness score we computed Knowledge-Based Trust. On synthetic
data, we show that our method can reliably compute the true
trustworthiness levels of the sources. We then apply it to a database of 2.8B
facts extracted from the web, and thereby estimate the trustworthiness of
119M webpages. Manual evaluation of a subset of the results confirms the
effectiveness of the method."
> "How can we estimate the trustworthiness of a webpage when we
don't know the truth? (cf crowdsourcing)"
> "We can formulate a latent variable model and use EM."
> "But we must be careful to distinguish errors in the source from errors
in the extraction systems."
> "This paper proposes a new metric for evaluating web-source quality -
knowledge-based trust. We proposed a sophisticated probabilistic model that
jointly estimates the correctness of extractions and source data, and the
trustworthiness of sources. In addition, we presented an algorithm that
dynamically decides the level of granularity for each source."
> "challenges:
> "strategies:
> "algorithm:
`truth finding`
> "The World Wide Web has become a rapidly growing platform
consisting of numerous sources which provide supporting or contradictory
information about claims (e.g., “Chicken meat is healthy”). In order to decide
whether a claim is true or false, one needs to analyze content of different
sources of information on the Web, measure credibility of information
sources, and aggregate all these information. This is a tedious process and
the Web search engines address only part of the overall problem, viz.,
producing only a list of relevant sources. In this paper, we present ClaimEval,
a novel and integrated approach which given a set of claims to validate,
extracts a set of pro and con arguments from the Web information sources,
and jointly estimates credibility of sources and correctness of claims.
ClaimEval uses Probabilistic Soft Logic, resulting in a flexible and principled
framework which makes it easy to state and incorporate different forms of
prior-knowledge. Through extensive experiments on realworld datasets, we
demonstrate ClaimEval’s capability in determining validity of a set of claims,
resulting in improved accuracy compared to state-of-the-art baselines."