Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017

Abstract

Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. As it can be edited by anyone, entries frequently get vandalized, leading to the possibility that it might spread of falsified information if such posts are not detected. The WSDM 2017 Wiki Vandalism Detection Challenge requires us to solve this problem by computing a vandalism score denoting the likelihood that a revision corresponds to an act of vandalism and performance is measured using the ROC-AUC obtained on a held-out test set. This paper provides the details of our submission that obtained an ROC-AUC score of 0.91976 in the final evaluation.

Publication
arXiv:1712.06922 [cs]
Avatar
Liyuan Liu
Senior Researcher @ MSR

Understand the underlying mechanism of pretraining heuristics.