Publications

All publications and implementations are made publicly available. If you cannot obtain a particular resource, feel free to drop me an email.


* indicates equal contribution.

(2024). Learning a Decision Tree Algorithm with Transformers. arXiv:2402.03774 [cs].

PDF

(2023). Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs. Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024).

PDF

(2023). Fast-ELECTRA for Efficient Pre-training. Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024).

PDF

(2023). Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024). Selected as Oral.

PDF

(2023). Sparse Backpropagation for MoE Training. WANT@NeurIPS 2023. Selected as Oral.

PDF

(2023). Bridging Discrete and Backpropagation: Straight-Through and Beyond. Proceedings of the Proceeding of Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023). Selected as Oral.

PDF

(2022). Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting. Proceedings of the Thirty-sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022). Selected as Oral.

PDF

(2022). Understand and modularize generator optimization in ELECTRA-style pretraining. Proceedings of the Fortieth International Conference on Machine Learning (ICML 2023).

PDF

(2022). Toward Student-oriented Teacher Network Training for Knowledge Distillation. Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024).

PDF

(2022). Piled: An identify-and-localize framework for few-shot event detection. arXiv:2202.07615 [cs].

PDF

(2021). Multi-head or Single-head? An Empirical Comparison for Transformer Training. the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD 2021).

PDF Code

(2021). Multi-head or Single-head? An Empirical Comparison for Transformer Training. arXiv:2106.09650 [cs].

PDF

(2020). Empower Distantly Supervised Relation Extraction with Collaborative Adversarial Training. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021).

(2020). Overfitting or Underfitting? Understand Robustness Drop in Adversarial Training. arXiv:2010.08034 [cs].

PDF Code

(2020). On the Transformer Growth for Progressive BERT Training. arXiv:2010.12562 [cs].

PDF

(2020). Very Deep Transformers for Neural Machine Translation. arXiv:2008.07772 [cs].

PDF Code

(2020). Towards Adaptive Residual Network Training: A Neural-ODE Perspective. the Thirty-seventh International Conference on Machine Learning (ICML 2020).

PDF Code Supplemental

(2020). Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling. the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).

PDF Code

(2020). Facet-Aware Evaluation for Extractive Text Summarization. the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).

PDF Code

(2020). Joint Aspect-Sentiment Analysis with Minimal User Guidance. the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020).

PDF

(2020). Understanding the Difficulty of Training Transformers. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Selected as Oral.

PDF Code Slide

(2020). On the Variance of the Adaptive Learning Rate and Beyond. the Eighth International Conference on Learning Representations (ICLR 2020).

PDF Code

(2020). NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network. the 2020 Web Conference (WWW 2020).

PDF Code

(2019). Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019).

PDF Code Blog (3rd-Party)

(2019). CrossWeigh: Training Named Entity Tagger from Imperfect Annotations. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019).

PDF Code Blog (3rd-Party) Video (3rd-Party)

(2019). Raw-to-End Name Entity Recognition in Social Media. arXiv:1908.05344 [cs].

PDF Code

(2019). Arabic Named Entity Recognition: What Works and What's Next. the Fourth Arabic Natural Language Processing Workshop (WANLP 2019).

PDF Code

(2019). Reliability-aware Dynamic Feature Composition for Name Tagging. the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).

PDF Code

(2019). Constructing and Mining Heterogeneous Information Networks from Massive Text. Conference tutorial at the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2019).

PDF Tutorial Page Slides

(2019). Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction. the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019).

PDF DOI

(2018). Learning Named Entity Tagger using Domain-Specific Dictionary. the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).

PDF Code Blog Blog (Chinese) Doc

(2018). Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling. the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). Selected as Oral.

PDF Code Blog Doc

(2018). Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings. arXiv:1803.03370 [cs].

PDF

(2018). Empower Sequence Labeling with Task-Aware Neural Language Model. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018). Selected as Oral.

PDF Code (new) Code (old) Blog

(2018). Contrast Subgraph Mining from Coherent Cores. arXiv:1802.06189 [cs].

PDF

(2017). Wikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017. arXiv:1712.06922 [cs].

PDF

(2017). Graph Clustering with Embedding Propagation. the 2020 IEEE International Conference on Big Data (IEEE BigData 2020).

PDF

(2017). Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach. the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017).

PDF Code Blog Slides

(2017). TrioVecEvent: Embedding-based Online Local Event Detection in Geo-tagged Tweet Streams. the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017).

PDF

(2015). Community Detection Based on Structure and Content: A Content Propagation Perspective. the 2015 IEEE International Conference on Data Mining (ICDM 2015).

PDF Code