The 1st International Workshop on Data-driven AI (DATAI 2024)

Workshop Schedule

Time	Topic	Author
09:00-09:30	The Story of Dataprep.ai	Jiannan Wang
09:30-09:45	Addressing Data Management Challenges for Interoperable Data Science	Ilin Tolovski, Tilmann Rabl
09:45-10:00	Missing Value Imputation via Pre-trained Language Models with Trainable Prompt and Retrieval Augmentation	Xiang Huang, Shuang Hao
10:00-10:10	Coffee Break
10:10-10:40	Accelerating Storage with Polar-Stack on Baidu AI Cloud	Li Shuo
10:40-10:55	LLM-assisted Labeling Function Generation for Semantic Type Detection	Chenjie Li, Dan Zhang, Jin Wang
10:55-11:10	Approximate Functional Dependencies Discovery Using Markov Blanket	Jinqi Liu, Anzhen Zhang, Jiajia Li, Na Guo, Jing Zhang

Abstract

The advent of artificial intelligence (AI), particularly through deep learning (DL) and large language models (LLMs), has marked a significant milestone in technological advancement, attributing to its unparalleled accuracy and generalization abilities. The rapid evolution of AI model structures to achieve superior performance underscores the dynamic progression and potential of AI technologies. However, the cornerstone of any AI's success lies not just in its algorithmic prowess but in the quality of data it is trained on. High-quality, accurate, consistent, and representative data sets are imperative for enhancing AI models' learning efficacy, thereby optimizing their generalization capabilities and reducing computational demands.

Beyond just leveraging quality data, AI technology itself plays a pivotal role in enhancing data quality through its powerful tools for data management. From cleaning, labeling, and validation to sophisticated feature engineering, AI ensures data accuracy, integrity, consistency, and reliability. This creates a symbiotic relationship between AI technology and high-quality data, highlighting their mutual dependence and the complementary nature of their interaction. It is this synergy that the 1st International Workshop on Data-driven AI (DATAI) aims to explore, delving into the latest research breakthroughs and presenting innovative techniques and methodologies at the forefront of data-driven AI.

This workshop is dedicated to fostering a comprehensive understanding of the intricate relationship between AI technologies and the data they depend on, focusing on the development of high-quality data specifically tailored for AI technologies, with a particular emphasis on large-scale models. Through engaging researchers, developers, and practitioners in rigorous discussions, the workshop seeks to explore sustained advancements, design innovations, and practical applications of data construction techniques that propel the progress of AI technologies forward.

Topics of Interest

Relevant topics include, but are not limited to:

Data discovery for ML
Data cleaning & integration for ML
Labeling quality and ML performance
Data-efficient solutions for ML training
LLM-based data cleaning & integration
Multi-modal data lakes (retrieval-)augmented large langauge models

By fostering a collaborative environment, DATAI aims to inspire a diverse audience of participants from the realms of AI and data quality management, facilitating an exchange of ideas that propels the field toward groundbreaking developments.

Contact Information

For further inquiries, please contact the chairs through the provided email addresses in the official document[View PDF].

Hongzhi Wang: wangzh@hit.edu.cn
Nan Tang: nantang@hkust-gz.edu.cn

2024

Data
driven
AI

DATAI2024

Guangzhou
China

DATAI2024

Guangzhou, China

Workshop Schedule

Abstract

Topics of Interest

Relevant topics include, but are not limited to:

Contact Information

2024Data drivenAI

DATAI2024

GuangzhouChina

DATAI2024

Guangzhou, China

Workshop Schedule

Abstract

Topics of Interest

Relevant topics include, but are not limited to:

Contact Information

2024

Data
driven
AI

Guangzhou
China