Wikidata:Property proposal/dependency grammar relations
Dependency grammar relations
editrelationship to head in syntactic dependency
editOriginally proposed at Wikidata:Property proposal/Lexemes
position of head in syntactic dependency
editOriginally proposed at Wikidata:Property proposal/Lexemes
Description | position (as a value for series ordinal (P1545) on a "combines" statement) of the head to which the syntactic dependency relationship qualifying this combines lexemes (P5238) value points |
---|---|
Data type | String |
Allowed values | values of series ordinal (P1545) on other combines lexemes (P5238) values on the lexeme |
Example 1 | Using the current values of combines lexemes (P5238) on cool as a cucumber (L559269) as an example:
|
Example 2 | Using the values of combines lexemes (P5238) on মাছের তেলে মাছ ভাজা (L314705) as an example:
|
Example 3 | Using possible values of combines lexemes (P5238) on som man reder, ligger man (L345524) as an example, assigning them series ordinal (P1545) values 1 through 5 in order:
|
Example 4 | Using possible values of combines lexemes (P5238) on sich einen feuchten Kehricht um etwas kümmern (L46007) as an example, assigning them series ordinal (P1545) values 1 through 7 in order (I am de-0, so the relationships might not be quite correct):
|
Planned use | addition to combines lexemes (P5238) values on multi-term lexemes |
See also | object of statement has role (P3831) |
Motivation
editThis is an attempt to provide qualifiers to combines lexemes (P5238) in aiding the construction of syntactic trees for multi-term lexemes in a dependency grammar framework (with an adaptation of Universal Dependencies in mind here):
- The first property is intended to indicate the particular relationship between the lexeme being qualified (the dependent) and the lexeme on which it depends (the head). A first attempt at a mapping by @Tpt: three years ago exists, but a set of items for better alignment with the UD relation set (or deviations therefrom where they may be needed) should be determined. (The items which are used in the examples for the first qualifier are the closest equivalents I could think of at the moment I wrote this proposal; they certainly may not be the ones actually used.)
- The second property, ideally, would be a pointer to the actual "combines" statement containing the head of the relationship in which the lexeme is taking part, but absent such a datatype the next best thing would be to use the series ordinal (P1545) values on other combines lexemes (P5238) statements as a guide. As it is largely already the case that combines lexemes (P5238) statements are qualified with series ordinal (P1545) (6989 out of 9447 lexemes), this would fit in nicely with those qualifiers, and the existence of a proper tree relationship among the components could be checked on the client side if desired.
Suggestions for improvement of both of these properties (or suggestions of an alternate manner of representing dependency grammar relationships) welcome. Mahir256 (talk) 01:23, 1 July 2021 (UTC)
Discussion
edit- Question Just out of curiosity, what applications does this data have? And why would these applications prefer this data over automated taggers trained on already published UD data? — Robert Važan (talk) 10:22, 1 July 2021 (UTC)
- @Robert Važan: This is intended for Abstract Wikipedia renderers that generate text by constructing syntax trees based on a dependency grammar. The overall dependency relation set is not expected to be aligned completely with UD (indeed, why would it, when it predates the concept of Wikidata lexemes and when these trees for multi-term lexemes should be manipulable in the course of text generation), and the outputs of taggers that do exist for some languages (perhaps all those noted on UD's home page) may not coincide with the dependency relation set we choose. It is thus not intended for entirely UD-compliant annotation of existing data, not just because for some languages (such as Breton and Kurmanji) the UD corpora that do exist are much smaller (which would affect the quality of the tagger used), but also because for the five Abstract Wikipedia focus languages, in addition to many others for which we have lexemes, UD corpora are to my knowledge nonexistent. (One of course is welcome to get inspired by the corpora that do exist in the course of annotating lexemes with this information.) Mahir256 (talk) 12:23, 1 July 2021 (UTC)
- Support Tree model of phrases. Useful in NLP. Most of the data can be generated semi-automatically. — Robert Važan (talk) 14:58, 1 July 2021 (UTC)
- Question Hi, can you throw together an example query in your proposal Motivation that shows output of the "list of ordered elements" against L559269 or the representation you are thinking? My SPARQL is rusty here, would it just use ORDER BY P1545 or something? A few hypothetical hand-waving SPARQL queries and their likely result representation would help me.
- (Responding inline for clarity.) I just added Sandbox-String (P370) and Sandbox-Item (P369) to the components of মাছের তেলে মাছ ভাজা (L314705) and am now able to graph the connections and the types of relationships like so.
- I'm unclear on your examples that seem to imply some "hierarchy" similar to UD results overlay? Your 1's, 2's, 3's, and 4's. I cannot see the data modeling which can have different flexible data representations. A "hierarchy"(which might use a dictionary type like Python's), a "tuple", a "vector", a "set" are different things and representations. I see you map "cucumber" to 1, and "as" to 4. Are they distances, levels, what?
- The current uses of series ordinal (P1545) as qualifiers to combines lexemes (P5238) typically represent the order in which the components of a lexeme appear in it (e.g. "cool" appears first, "as" second, and so on). The component "as" having a P1545 qualifier of "2" and a "location of head" value of "4" means that an edge exists from "as" pointing to "cucumber" (which has a P1545 value of "4"). As Wikidata statements cannot have other statements or more complex types of structures as values, the use of these two properties to represent an edge in a dependency graph is the best possible that I can come up with at the moment.
- Got it! That works nicely for now. -Thadguidry (talk) 03:36, 9 July 2021 (UTC)
- The current uses of series ordinal (P1545) as qualifiers to combines lexemes (P5238) typically represent the order in which the components of a lexeme appear in it (e.g. "cool" appears first, "as" second, and so on). The component "as" having a P1545 qualifier of "2" and a "location of head" value of "4" means that an edge exists from "as" pointing to "cucumber" (which has a P1545 value of "4"). As Wikidata statements cannot have other statements or more complex types of structures as values, the use of these two properties to represent an edge in a dependency graph is the best possible that I can come up with at the moment.
- Further, what does that mean to a developer or presentation in a SPARQL result that might use different data modeling (data overlays) with a Lexeme data representation? In OpenRefine we might later have different Data Representation for various Data Models (actually we have this now, but plan to expand it more) and this might include Lexemes, even though our EPIC that I drafted with lots of ideas uses Records as examples it could use Lexemes or anything needing Flexible Data Representation https://github.com/OpenRefine/OpenRefine/issues/2825 -Thadguidry (talk) 02:57, 9 July 2021 (UTC)
- Can you elaborate on this part as it pertains to lexemes? Are you taking issue with these properties being useful only in the context of dependency grammar modeling (as opposed to, say, phrase structure grammars)? Mahir256 (talk) 03:26, 9 July 2021 (UTC)
- No issue, the edges are clearly represented and retain their labels as your query example shows. -Thadguidry (talk) 03:36, 9 July 2021 (UTC)
- Can you elaborate on this part as it pertains to lexemes? Are you taking issue with these properties being useful only in the context of dependency grammar modeling (as opposed to, say, phrase structure grammars)? Mahir256 (talk) 03:26, 9 July 2021 (UTC)
- Support Thanks, that's what I needed to see and know, so yes this is useful! I would hope after approval that some nice docs and good examples are kept around. Let's not lose the knowledge. -Thadguidry (talk) 03:36, 9 July 2021 (UTC)
- Support, no concerns. Regards, ZI Jony (Talk) 13:37, 9 July 2021 (UTC)
- @Mahir256, Robert Važan, Thadguidry, ZI Jony: Done as syntactic dependency head relationship (P9763) and syntactic dependency head position (P9764). UWashPrincipalCataloger (talk) 16:39, 30 July 2021 (UTC)