Wiki Workshop 2020

A forum bringing together researchers exploring all aspects of Wikimedia projects. Held at The Web Conference 2020 in Taipei, Taiwan, 21 April 2020.

  • April 21, 2020: Workshop took place -- thank you to all of you who participated. (Papers, Slides)
  • April 21, 2020: Wiki Workshop day! If you have registered through the Web Conference or Wiki Workshop, check your email for instructions to join.
  • Apr. 16, 2020: Register for Wiki Workshop 2020 here.
  • Feb. 11, 2020: Benjamin Mako Hill (University of Washington) confirmed as invited speaker.
  • Jan. 30, 2020: Misha Teplitskiy (University of Michigan) confirmed as invited speaker.
  • Jan. 30, 2020: Kristina Lerman (USC ISI) confirmed as invited speaker.
  • Jan. 30, 2020: Mark Graham (Internet Archive) confirmed as invited speaker.
  • Dec. 4, 2019: Workshop date announced: Tuesday, 21 April 2020.
  • Nov. 15, 2019: Wiki Workshop 2020 webpage online.
5:00 - 5:20Welcome and icebreaking
5:20 -6:05 Keynote: Jess Wade (Imperial College London). (Video, Slides)
6:05 - 6:15Break
6:15 - 7:00A conversation with Mark Graham (Internet Archive), moderated by Bob West (EPFL). (Video)
7:00 - 7:05Break
7:13 - 8:00A panel conversation with Benjamin Mako Hill (U of Washington), Misha Teplitskiy (U of Michigan), Kristina Lerman (USC), Jérôme Hergueux (CNRS). (Video)
8:00 - 8:10Break
8:10 - 9:15Featured & lightning talks (Video, Slides)
9:15 - 9:20Wrap up
9:30 - 10:15Poster session
10:15 - 11:00The social event

Jess Wade (Imperial College London)

Jess Wade
Bio
Jessica Wade BEM is a British physicist in the Blackett Laboratory at Imperial College London. Her research investigates polymer-based organic light emitting diodes (OLEDs). Her public engagement involves work in science, technology, engineering, and mathematics (STEM) as well as championing women in physics and tackling gender bias on Wikipedia. As of May 2020, Jess has written more than 1000 biographies of women scientists on English Wikipedia.

Mark Graham (Internet Archive)

Mark Graham
Bio
Mark Graham has created and managed innovative online products and services since 1984. As Director of the Wayback Machine he is responsible for capturing, preserving and helping people discover and use, more than 1 billion new web captures each week. Previously, Mark was Senior Vice President with NBC News, Senior Vice President of Technology with iVillage, and a co-founder of Rojo Networks. In the early days of the net he managed technology and business development at The WELL and also helped bring the pre-web Internet to millions of people by running AOL's Gopher project as part of their Internet Center. He managed technology for the pioneering US-Soviet Sovam Teleport email service and co-founded and managed PeaceNet, IGC.org, and APC.org. Mark's early training and experience with computer-mediated communications was acquired while he served in the US Air Force, spending more than 3 years working at the Air Force Data Services Center at the Pentagon.
Jérôme Hergueux
Bio
Jerome is an Assistant Research Professor at the French National Center for Scientific Research (CNRS), a Research Affiliate at the Center for Law and Economics at ETH Zurich, and a Faculty Associate at the Berkman Klein Center for Internet & Society at Harvard University. Jerome is a behavioral economist operating at the boundaries between psychology, economics and computational social science. In his research, he couples experimental methods with the analysis of big data to uncover how psychological and cognitive traits shape our behavior over the Internet, with a particular focus on online cooperation, peer production and decision making. Jerome's most recent research leverages field data from several Internet-based environments, such as online poker, open source software development platforms, and Wikipedia. His current interests relate to the study of how people deal with their decision making biases, as well as the determinants and consequences of communication style and leadership in virtual teamwork.

Benjamin Mako Hill (University of Washington)

Benjamin Mako Hill
Bio
Benjamin Mako Hill is an Assistant Professor in the University of Washington Department of Communication, an Adjunct Assistant Professor in the departments of Human-Centered Design and Engineering and Computer Science and Engineering, and Affiliate Faculty in the Center for Statistics and the Social Sciences, the eScience Institute, and the "Design Use Build" (DUB) group that supports research on on human computer interaction. He is also a Faculty Associate at the Berkman Klein Center for Internet and Society at Harvard University and an affiliate of the Institute of Quantitative Social Science at Harvard. Mako studies collective action in online communities and seeks to understand why some attempts at collaborative production — like Wikipedia and Linux — build large volunteer communities while the vast majority never attract even a second contributor. His research is deeply interdisciplinary, consists primarily of "big data" quantitative analyses, and lies at the intersection of communication, human-computer interaction, and sociology. Mako received his PhD from MIT.

Kristina Lerman (USC ISI)

Kristina Lerman
Bio
Kristina Lerman is a Project Leader at the Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Viterbi School of Engineering's Computer Science Department. Her research focuses on applying network- and machine learning-based methods to problems in social computing.

Misha Teplitskiy (University of Michigan)

Misha Teplitskiy
Bio
Misha Teplitskiy is an Assistant Professor at the School of Information, University of Michigan. His research is at the intersection of Science of Science + Sociology of Organizations + Computational Social Science. He studies how social and organizational factors affect scientific discovery. He is especially interested in evaluation practices in science, and whether they promote or stifle innovation. My approach relies primarily on field experiments -- interventions in scientific competitions and other settings -- and applying computational tools to large-scale observational data. Previously, he was a postdoc at the Laboratory for Innovation Science at Harvard (LISH). He received his PhD in Sociology from the University of Chicago, where he was a member of KnowledgeLab.
Ziang Chuai, Qian Geng and Jian Jin
Domain-Specific Automatic Scholar Profiling Based on Wikipedia [PDF]
Chien-Chun Ni, Kin Sum Liu and Nicolas Torzec
Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph [PDF]
Marie Destandau and Jean-Daniel Fekete
Diagnosing Incompleteness in Wikidata with The Missing Path [PDF]
Kateryna Liubonko and Diego Sáez-Trumper
Matching Ukrainian Wikipedia Red Links with English Wikipedia’s Articles [PDF]
Yessica Herrera-Guzman, Eduardo Graells-Garrido and Diego Caro
Beyond Performing Arts: Network Composition and Collaboration Patterns [PDF]
Ang Li and Rosta Farzan
Collaboration of open content news in Wikipedia: The role and impact of gatekeepers [PDF]
Kai Zhu, Dylan Walker and Lev Muchnik
Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia [PDF]
Pablo Beytía
The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia [PDF]
Natalie Bolón Brun, Sofia Kypraiou, Natalia Gullón Altés and Irene Petlacalco Barrios
Wikigender: A Machine Learning Model to Detect Gender Bias in Wikipedia [PDF]
Nicholas Vincent and Brent Hecht
A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines [PDF]
Blagoj Mitrevski, Tiziano Piccardi and Robert West
WikiHist.html: English Wikipedia’s Full Revision History in HTML Format [PDF]
Ai-Jou Chou, Guilherme Gonçalves, Sam Walton and Miriam Redi
Citation Detective: a Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale [PDF]
Volodymyr Miz, Joëlle Hanna, Nicolas Aspert, Benjamin Ricaud and Pierre Vandergheynst
What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions [PDF]

Workshop date: Tuesday, 21 April 2020

If authors want paper to appear in proceedings:

  • Submission deadline: 17 January 2020
  • Author feedback: 3 February 2020
  • Camera-ready version due: 17 February 2020

If authors do not want paper to appear in proceedings:

  • Submission deadline: 21 February 2020
  • Author feedback: 6 March 2020
Note: If you need a visa to travel to Taiwan and your application for the visa depends on your workshop paper being accepted, we would advise you to submit your workshop paper for the 17 January deadline. (You could still opt for not having your paper included in the proceedings.)

Wikipedia is one of the most popular sites on the Web, a main source of knowledge for a large fraction of Internet users, and one of the very few projects that make not only their content but also many activity logs available to the public. Furthermore, other Wikimedia projects, such as Wikidata and Wikimedia Commons, have been created to share other types of knowledge with the world for free. For a variety of reasons (quality and quantity of content, reach in many languages, process of content production, availability of data, etc.) such projects have become important objects of study for researchers across many subfields of the computational and social sciences, such as social network analysis, artificial intelligence, linguistics, natural language processing, social psychology, education, anthropology, political science, human–computer interaction, and cognitive science.

The goal of this workshop is to bring together researchers exploring all aspects of Wikimedia projects such as Wikipedia, Wikidata, and Commons. With members of the Wikimedia Foundation's Research team on the organizing committee and with the experience of successful workshops in 2015, 2016, 2017, 2018, and 2019, we aim to continue facilitating a direct pathway for exchanging ideas between the organization that coordinates Wikimedia projects and the researchers interested in studying them.

Topics of interest include, but are not limited to

  • new technologies and initiatives to grow content, quality, diversity, and participation across Wikimedia projects
  • use of bots, algorithms, and crowdsourcing strategies to curate, source, or verify content and structured data
  • bias in content and gaps of knowledge
  • diversity of Wikimedia editors and users
  • detection of low-quality, promotional, or fake content, as well as fake accounts (e.g., sock puppets)
  • questions related to community health (e.g., sentiment analysis, harassment detection)
  • understanding editor motivations, engagement models, and incentives
  • Wikimedia consumer motivations and their needs: readers, researchers, tool/API developers
  • innovative uses of Wikipedia and other Wikimedia projects for AI and NLP applications
  • consensus-finding and conflict resolution on editorial issues
  • participation in discussions and their dynamics
  • dynamics of content reuse across projects and the impact of policies and community norms on reuse
  • privacy
  • collaborative content creation (unstructured, semi-structured, or structured)
  • innovative uses of Wikimedia projects' content and consumption patterns as sensors for real-world events, culture, etc.
  • open-source research code, datasets, and tools to support research on Wikimedia contents and communities

Papers should be 1 to 8 pages long and will be published on the workshop webpage and optionally (depending on the authors' choice) in the workshop proceedings. The review process will be single-blind (as opposed to double-blind), i.e., authors should include their names and affiliations in their submissions. Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session.

We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages).

Papers should be 1 to 8 pages long. We explicitly encourage the submission of preliminary work in the form of extended abstracts (1 or 2 pages). No need to anonymize your submissions.

For submission dates, see above.

  • Pushkal Agarwal, King's College London
  • Giovanni Colavizza, University of Amsterdam
  • Martin Gerlach, Wikimedia Foundation
  • Kristina Gligorić, EPFL
  • Isaac Johnson, Wikimedia Foundation
  • Markus Krötzsch, University of Dresden
  • Florian Lemmerich, RWTH Aachen University
  • Jonathan Morgan, Wikimedia Foundation
  • Maxime Peyrard, EPFL
  • Tiziano Piccardi, EPFL
  • Diego Saez-Trumper, Wikimedia Foundation
  • Morten Warncke-Wang, Wikimedia Foundation
  • Ramtin Yazdanian, EPFL
  • Amy Zhang, MIT

Miriam Redi

Miriam is a Research Scientist at the Wikimedia Foundation and Visiting Research Fellow at King's College London. Formerly, she worked as a Research Scientist at Yahoo! Labs in Barcelona and Nokia Bell Labs in Cambridge. She received her PhD from EURECOM, Sophia Antipolis. She conducts research in social multimedia computing, working on fair, interpretable, multimodal machine learning solutions to improve knowledge equity.

Leila Zia

Leila is a senior research scientist at the Wikimedia Foundation. Her current research interests are on understanding Wikipedia's readers, quantifying and addressing the gaps of knowledge in Wikipedia and Wikidata, and understanding and improving diversity in Wikipedia. She holds a PhD in management science and engineering from Stanford University.

Robert West

Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.

Please direct your questions to wikiworkshopgooglegroupscom.