IEEE Conference on Data Mining

2023 IEEE ICDM Research Contributions Award: Professor Jie Tang

The IEEE ICDM Research Contributions Award is the highest recognition for research achievements in Data Mining, and is annually given to one individual or one group who has made influential research contributions to the field of Data Mining. The 2023 IEEE ICDM Research Contributions Award goes to Professor Jie Tang of Tsinghua University, China.

Dr. Jie Tang is a WeBank Chair Professor of Computer Science at Tsinghua University and Director of Foundation Model Research Center (FMR) of AI Institute of Tsinghua University. He is a Fellow of the ACM, AAAI and IEEE. His research interests include artificial general intelligence (AGI), data mining and social networks.

Professor Tang has made pioneering contributions in large language models, data mining systems, and network science and foundation models. His publications are cited more than 32,000 times and his h-index is 92. He has authored the open-source GLM-serial models (https://github.com/THUDM/), a serial of general-purpose large language models. The GLM base model has been downloaded by more than 1,000 organizations from 70+ countries, and the open ChatGLM-6B series has also gained widespread adoption with nearly 10,000,000 downloads worldwide according to Hugging Face. Professor Tang also invented AMiner.cn, an academic network search and mining system that has been in operation since March 2006. He has contributed fundamental algorithms in the areas of social network mining and transformative techniques used in AMiner, which has won the SIGKDD Test-of-Time Award (10-Year Best Paper). To date, AMiner has attracted 30,000,000 users from 220 countries/regions in the world.

Large Language Models: One ground-breaking contribution that Professor Tang has made is the large-scale pre-training models, such as GLM-130B (ICLR '23), ChatGLM-130B, and CodeGeeX (KDD '23). He initialized and developed the General Language Model (GLM) project that represents a step forward towards a highly generalized form of AI, sometimes referred to as artificial general intelligence. The GLM (general language model) architecture (ACL '22), simultaneously addresses language understanding (such as BERT) and generation (e.g., GPT) challenges. He developed the GLM-130B model, which is an open bilingual (English & Chinese) bidirectional pre-trained model with 130 billion parameters. Forbes commented the GLM project as "a system that surpassed OpenAI's GPT-3 in so many ways" in 2021. Different from the in-house or pricing practices of big tech enterprises (e.g., Google, OpenAI, and DeepMind) on big pre-trained models, all the billion/trillion-scale pre-trained models developed by his team have been open sourced and made publicly available to the research community with the goal of making AI more democratic. These efforts in super-scale pre-training models make his group at Tsinghua University "seems to be the only academic institution in the world keeping up with big corporation labs like OpenAI, Google Brain, DeepMind, Baidu, etc. in large AI models".

Graphs: Professor Tang's research on the theory of network embedding has been significant in shaping the field of graph representation learning. He finds that network embedding models (e.g., DeepWalk, LINE, PTE, and node2vec) can be theoretically unified into one matrix factorization framework with closed forms. This work, laying a theoretical foundation for network embedding methods, was published in WSDM '18 with 900+ citations to date, making it the most cited WSDM publication in the last five years.

AMiner: Professor Tang is well-known for building the academic social network search system ArnetMiner (i.e., AMiner), which was launched in March 2006. Over the past 17 years (2006 to present), he has been continuously developing innovative algorithms, scalable models, and transformative techniques to address the fundamental problems, challenges in academic search and data mining. Specifically, he discussed how to extract user/scholar profiles from the Web at ICDM '07, the core architecture & techniques of AMiner in KDD '08 (SIGKDD Test of Time Award in 2020), modeling scientific influence in ICDM '09 and KDD '09 (TAP: 1200+ citations), mining advisor-advisee relationships in KDD '10, topic-level academic network search in KDD '11, name recognition in ICDM '12 (ICDM Contest Champion), network alignment in ICDM '17, name disambiguation in KDD '18, open academic graph (OAG) in KDD '19 (Microsoft Research Collaborative Research Award), academic network pre-training in KDD '20, heterogeneous academic network benchmark in KDD '21, academic language model OAG-BERT in KDD '22, and the Web-scale name disambiguation benchmark WhoIsWho in KDD '23. Over these 17 years, He has made AMiner a global service for (1) 30 million+ individual academic users from 220 countries and regions, and (2) numerous funding agencies, including the two largest government funding agencies in China (NSF of China and the Ministry of Science and Technology of China), as well as adopted by (3) various conferences and journals (e.g., KDD, ICDM, NeurIPS, ACM Transactions TKDD, and IEEE Transactions TBD) for PC/reviewer assignments, COI identification, scholar profile extraction, and so on. For his unique contributions through AMiner, he was honored a Distinguished Young Scholar of the NSF of China and the XPLORER Prize, and received the SIGKDD Test of Time Award and SIGKDD Serviced Award, among others.

2023 IEEE ICDM Nomination and Evaluation Committees

Xindong Wu (Co-Chair), Hefei University of Technology, China
Chengqi Zhang (Co-Chair), University of Technology Sydney, Australia
James Bailey, University of Melbourne, Australia
Diane Cook, Washington State University, USA
Peter Flach, University of Bristol, UK
Joydeep Ghosh, University of Texas at Austin, USA
Vipin Kumar, University of Minnesota, USA
Jian Pei, Simon Fraser University, Canada
Claudia Plant, University of Vienna, Austria

From Xindong Wu (xwu AT hfut.edu.cn) on October 11, 2023.