Constructing and Mining Heterogeneous Information Networks from Massive Text

Abstract

Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform unstructured text into structured knowledge. Based on our vision, it is highly beneficial to transform such text into structured heterogeneous information networks, on which actionable knowledge can be generated based on the user’s need. In this tutorial, we provide a comprehensive overview on recent research and development in this direction. First, we introduce a series of effective methods that construct heterogeneous information networks from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user’s need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.

Publication
Conference tutorial at the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2019)
Avatar
Liyuan Liu
Senior Researcher @ MSR

Understand the underlying mechanism of pretraining heuristics.