Hello,
I am working a on project that used to be called wikimark but now is simply
called sensimark because it more general that just wikipedia.
Anyway, the project is as follow:
1. Gather annotated hierarchical dataset of documents that categories,
subcategories (and maybe sub categories again) and leaf nodes are documents
2. Train an algorithm on this documents.
3. Guess the most probable labels on a new document.
The algorithm works on paragraph, the goal is be able to have something
like the following:
out = wikimark("""Peter Hintjens wrote about the relation between
technology and culture.Without using a scientifical tone of
state-of-the-art review of the anthroposcene antropology,he gives a
fair amount of food for thought. According to Hintjens, technology is
doomed tobecome cheap. As matter of fact, intelligence tools will
become more and more accessible whichwill trigger a revolution to
rebalance forces in society.""")
for category, score in out:
print('{} ~ {}'.format(category, score))
And the output would be:
Art ~ 0.2
Science ~ 0.8
Society ~ 0.4
That is the goal of the project but were are not there yet.
I read classification in encyclopedia terms is complex matter but I have
settled on wikimedia vital articles on run experiments on level 3 which are
encouraging.
Here is the output of my program before post processing:
$ curl https://github.com/cultureandempire/cultureandempire.github.io/blob/master/…
| ./wikimark.py guess build/
similarity
+-- Technology ~ 0.09932770275501317
| +-- General ~ 0.09932770275501317
+-- Science ~ 0.09905069171042175
| +-- General ~ 0.09905069171042175
+-- Geography ~ 0.09897996204391411
| +-- Continents and regions ~ 0.09914627336640339
| +-- General ~ 0.09881365072142484
+-- Mathematics ~ 0.09897542847422805
| +-- Other ~ 0.09911568655298664
| +-- Arithmetic ~ 0.09883517039546945
+-- Society and social sciences ~ 0.09886767613461538
| +-- Social issues ~ 0.09886767613461538
+-- History ~ 0.09886377525104235
+-- General ~ 0.09893293240770612
+-- History by subject matter ~ 0.09884456012696491
+-- Post-classical history ~ 0.09881383321845605
The algorithm selected the 10 most relevant subcategories out of 73.
Now I need to scale this to level 5
<https://en.wikipedia.org/wiki/Wikipedia:Vital_articles/Level/5>. But
it poorly organized.
Can wikidata help wikipedia vital articles?
ref: https://en.wikipedia.org/wiki/Wikipedia_talk:Vital_articles#Wikidata_integr…
ref: https://github.com/amirouche/sensimark
FYI: scholarship applications for Wikimania 2019 are now open until Friday,
15 March.
---------- Forwarded message ---------
From: DerHexer <derhexer(a)wikipedia.de>
Date: Wed, 27 Feb 2019 at 20:09
Subject: [Wikimedia-l] Wikimania 2019 scholarship applications open
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Cc: Arild Vågen <arildvagen2(a)gmail.com>
Hi all,
We wanted to inform you that scholarship applications for Wikimania 2019
which is being held in Stockholm, Sweden on August 14–18, 2019 are now
being accepted. Applications are open until Friday, 15 March 2019 23:59 UTC.
Applicants will be able to apply for a partial or full scholarship. A full
scholarship will cover the cost of an individual's round-trip travel,
shared accommodation, and conference registration fees as arranged by the
Wikimedia Foundation. A partial scholarship will cover conference
registration fees and shared accommodation.
Applicants will be rated using a pre-determined selection process and
selection criteria established by the Scholarship Committee and the
Wikimedia Foundation, who will determine which applications are successful.
To learn more about Wikimania 2019 scholarships, please visit:
https://wikimania.wikimedia.org/wiki/Scholarships
To apply for a scholarship, fill out the multi-language application form
on: https://scholarships.wikimedia.org/apply
It is highly recommended that applicants review all the material on the
Scholarships page and the associated FAQ (
https://wikimania.wikimedia.org/wiki/Scholarships/FAQ ) before submitting
an application.
If you have any questions, please contact: wikimania-scholarships at
wikimedia.org or leave a message at:
https://wikimania.wikimedia.org/wiki/Talk:Scholarships .
Please help us spread the word and translate pages!
Best regards,
Arild Vågen and Martin Rulsch
for the Scholarship Committee
https://wikimania.wikimedia.org/wiki/Scholarship_Committee
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:[email protected]?subject=unsubscribe>
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
When dealing with Wikipedia Categories and placing statements about a
"topic's main category" P910
<https://www.wikidata.org/wiki/Property:P910>
on a symbolic representation topic of "cow" here
https://www.wikidata.org/wiki/Q4007160
A Potential Issue appeared today that puzzled me:
"Category:Heraldic figures" should also have the inverse statement
"category's main topic `cow`"
Category:Heraldic figures is much more about just "cow's". And so I
wouldn't want to say that the category's main topic is "cow" !
I would think that many categories "could" always be about many things, and
so this inverse property constraint on P910 seems overly protective *and
also seems to be encouraging users to add bad, erroneous information.*
So, I wonder if this constraint should be removed or special cased somehow
somewhere or at some level elsewhere?
See attached screenshot.
Thad
https://www.linkedin.com/in/thadguidry/
Dear all,
I have created a small SPARQL query service for Wikidata history. It
is in a very early stage.
It currently stores metadata about each Wikidata item or property
revision (contributor, timestamp, entity edited, previous/next
revision of the given entity) and a part of the revisions content
(wdt: direct claim relations and redirects). It allows to query the
triples added and removed by a revision and query the full state of
the Wikidata graph after any revision. The data loaded covers a range
from the creation of Wikidata to July 1st 2018.
The help page : https://www.wikidata.org/wiki/Wikidata:History_Query_Service
The query UI : https://wdhqs.wmflabs.org/
I hope it will be useful for doing interesting researches around Wikidata.
Feel free to email me if you need help with it. If it is useful for
many people I hope to be able to take time and get storage space to
load more recent data and the missing parts of Wikidata items
contents.
Best,
Thomas (User:Tpt)
Hello Michal,
Thanks for sharing this information. I hope that while building this
project, the existing resources will be browsed and reused.
Over the years, the community has built some various formats of resources
about the Wikidata Query Service and SPARQL:
- The portal page linking to resources
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_…
- A gentle introduction
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/A_gentle_introd…
- A detailed tutorial (both could be translated in more languages)
https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial
- A full video training by Asaf https://www.youtube.com/watch?v=kJph4q0Im98
- A book on Wikibooks https://en.wikibooks.org/wiki/SPARQL
- A list of various examples
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
- A page where the community can answer questions and provide help
https://www.wikidata.org/wiki/Wikidata:Request_a_query
Of course, all of these content pages could probably be improved and better
translated, but I think we're not starting from scratch here :)
Cheers,
Léa
On Thu, 14 Feb 2019 at 10:41, Michal Lester <mlester(a)wikimedia.org.il>
wrote:
> Dear all,
> One recurring phenomenon we encounter when we present Wikidata to various
> audiences, is the enthusiasm with they react to the Wikidata Query Service,
> and the possibilities it offers for extracting specific information from
> the vast network of linked and structured data contained in Wikidata. This
> enthusiasm is not surprising, as the query feature of Wikidata is quite
> unique within the landscape of information services available today.
> The Wikidata Query Service is powered by SPARQL – a semantic query language
> for databases. Unfortunately, for users who are new to Wikimedia platforms,
> there is currently little instructional material on how to learn SPARQL for
> use in Wikidata. At Wikimedia Israel we believe that a user-friendly
> tutorial to Queries/SPARQL will attract new users to engage with Wikidata
> and help build a community around the project.
> In recent years, Wikimedia Israel has developed online instructional
> materials, such as the Wikipedia courseware and the guide for creating
> encyclopedic content. We plan to use our experience in this field, and in
> collaboration with Wikimedia Deutschland, we intend to develop a website
> with a step-by-step tutorial to learn how to use the Wikidata Query
> Service. The instructional material will be available in three languages
> (Hebrew, Arabic and English) but it will be possible to add the same
> instructions in other languages. We are quite confident that having a
> tutorial that explains and teaches the Query Service will help expand
> Wikidata to new audiences worldwide.
>
> *Best regards,*
>
>
> *Michal Lester,*
> *Executive DirectorWikimedia Israel*
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[email protected]?subject=unsubscribe>
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Dear all,
after a long time of refactoring, we are about to start releasing
DBpedia again along with a new mission and identity.
The release notes are here:
https://docs.google.com/document/d/1hv3HCTumKxbaMQe936TyjwAl5_1v6BjYng-N5W3…
(note that we are waiting for the board to give final approval here, but
it should be published this week)
Now, before there are any misconceptions, we would like to get the
Wikidata and DBpedia story a bit straighter:
1. DBpedia isn't really about the data. Data is produced and it is
useful, but the main point was always to push the way knowledge
extraction was done, how data is structured, how data is debugged, how
it is published, maintained, delivered, hosted like
http://fragments.dbpedia.org/ and customised. For example, DBpedia's
CTO Dimitris edited the SHACL (https://www.w3.org/TR/shacl/) standard
for data constraints, which is now supported by many triplestores and
engines (not sure about neptune or blazegraph, though).
2. The announcement is still missing an acknowledgement section and
instead of moving into any more awkwardness, I would like to include a
statement, that makes clear that the recent advances we made is not
against Wikidata, but because of Wikidata. We clearly see the benefits
of having the well-structured space Wikidata in addition to the 140
Wikipedia language editions. For us it is very good, that you have
star-shaped interlanguage article links instead of spaghetti and also
stable properties and also an edit interface (we had OntoWiki
http://ontowiki.net/ in 2005, but it violates data governance on
extracted data and we will probably never build an edit interface for
the data). So I would like to clearly phrase that the refactoring is
partly driven by the achievements of Wikidata. Partly now only, because
Goal 4 ID Management and Fusion is the step that can be directly traced
to Wikidata' achievements, but it is just a prototype and a Bachelor
thesis now.
All feedback is welcome.
--
All the best,
Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT)
Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
Hello everyone,
I am developing an application that requires many SPARQL queries to the
wikidata server (1k-2k each time I test it). The application front-end is
visible here: https://beta.water-fountains.org
I am just making SPARQL queries from a nodeJS server to get information
about fountains.
This has been working fine for about a year, until yesterday, when the
connection started returning ECONNRESET or "Socket hang up". If I run the
query from the browser instead of the server it works fine, and if I use a
vpn, the NodeJS server is able perform the queries without issue. Here is
an example query: link
Has anyone else had such an issue? Thanks!
Matthew
Hey folks :)
Ariel would like to make changes to the scheduling of the dumps. As
far as I can tell this should be fine. But if you use the dumps and
this would be an issue please weight in on the ticket.
Cheers
Lydia
---------- Forwarded message ---------
From: Ariel Glenn WMF <ariel(a)wikimedia.org>
Date: Sat, Feb 16, 2019 at 5:33 PM
Subject: [Wikitech-l] question about wikidata entity dumps usage
(please forward to interested parties)
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>, Wikipedia
Xmldatadumps-l <Xmldatadumps-l(a)lists.wikimedia.org>,
<research-internal(a)lists.wikimedia.org>
Hey folks,
We've had a request to reschedule the way the various wikidata entity dumps
are run. Right now they go once a week on set days of the week; we've been
asked about pegging them to specific days of the month, rather as the
xml/sql dumps are run. See https://phabricator.wikimedia.org/T216160 for
more info.
Is this going to cause problems for anyone? Do you ingest these dumps on a
schedule, and what works for you? Please weigh in here or on the
phabricator task; thanks!
Ariel
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.