2025

Data
driven
AI

DATAI2025

London
United Kingdom

September, 2025

In conjunction with

VLDB 2025

DATAI2025

London, United Kingdom

August, 2025

In conjunction with VLDB 2025

DATAI Workshop Schedule 2025 – September 5, 2025

Time	Type	Topic	Author
13:30–13:35	Opening	Opening Remarks: Welcome and Introduction to DATAI Workshop 2025	Workshop Chairs
13:35–14:05	Keynote	Data-centric Responsible AI from General ML to LLMs	Steven Euijong Whang
14:05–14:35	Keynote	Navigating Disruption: The Impact of AI Technologies on Data Integration Research	Ziawasch Abedjan
14:35–15:05	Invited Talk	Databases as AI Runtimes	Rihan Hai
15:05–15:15	Break	Coffee Break
15:15–15:25	Paper	SQL-ML: A SQL-Centric Framework for Building Efficient Feature Store	Ahmad Ghazal, Hanumath Maduri, Pekka Kostamaa
15:25–15:35	Paper	A Low Latency Cache for Cloud RDBMs	Guohai Zhang, Xin Tang, Qingchen Chang, Huanchen Zhang, Kai Hwang, Yuesen Li, Runhuai Huang, Teng Wang, Wusheng Zhang, Ming Zhang, Qingchun Chen, Xiaodong Hou, Qian Wang
15:35–15:45	Paper	The Case for Intent-Based Query Rewriting	Gianna Lisa Nicolai, Patrick Hansert, Sebastian Michel
15:45–15:55	Paper	Lightweight Pipelines: Good Enough is Sometimes Better	Camilla Sancricca, Cinzia Cappiello
15:55–16:05	Break	Coffee Break
16:05–16:35	Invited Talk	AI-Driven Data Typing: Toward Semantic and Functional Understanding of Relational Data	Chang Ge
16:35–16:45	Paper	CleanAgent: Automating Data Standardization with LLM-based Agents	Danrui Qi, Zhengjie Miao, Jiannan Wang
16:45–16:55	Paper	SoAgent: A Real-world Data Empowered Agent Pool to Facilitate LLM-Driven Generative Social Simulation	Na Ta, Kaiyu Li, Yushu Zhou, Yuhan Liu
16:55–17:05	Paper	DeepSearch: LLM-powered Data Acquisition for Machine Learning	Kaiyu Li, Zhongxin Hu, Yuxin Gao, Yuyang Wu
17:05–17:15	Paper	Detecting and Cleaning Errors in Personal Contact Information with Large Language Models	Anna-Christina Glock, Christine Dominka-Kiss, Philipp Korom, Lisa Ehrlinger

Abstract

The advent of artificial intelligence (AI), particularly through deep learning (DL) and large language models (LLMs), has marked a significant milestone in technological advancement, attributing to its unparalleled accuracy and generalization abilities. The rapid evolution of AI model structures to achieve superior performance underscores the dynamic progression and potential of AI technologies. However, the cornerstone of any AI's success lies not just in its algorithmic prowess but in the quality of data it is trained on. High-quality, accurate, consistent, and representative data sets are imperative for enhancing AI models' learning efficacy, thereby optimizing their generalization capabilities and reducing computational demands.

Beyond just leveraging quality data, AI technology itself plays a pivotal role in enhancing data quality through its powerful tools for data management. From cleaning, labeling, and validation to sophisticated feature engineering, AI ensures data accuracy, integrity, consistency, and reliability. This creates a symbiotic relationship between AI technology and high-quality data, highlighting their mutual dependence and the complementary nature of their interaction. It is this synergy that the 2nd International Workshop on Data-driven AI (DATAI) aims to explore, delving into the latest research breakthroughs and presenting innovative techniques and methodologies at the forefront of data-driven AI.

This workshop is dedicated to fostering a comprehensive understanding of the intricate relationship between AI technologies and the data they depend on, focusing on the development of high-quality data specifically tailored for AI technologies, with a particular emphasis on large-scale models. Through engaging researchers, developers, and practitioners in rigorous discussions, the workshop seeks to explore sustained advancements, design innovations, and practical applications of data construction techniques that propel the progress of AI technologies forward.

Topics of Interest

Relevant topics include, but are not limited to:

Data discovery for AI.
AI (LLMs)-driven data discovery.
Data cleaning & integration for AI.
Data quality for AI in time series data.
AI for data system.
AI (LLMs)-driven data cleaning & integration.
LLM-based data extraction.
AI (LLMs)-driven data transformation.
Data selection for AI, including LLMs pre-training & SFT.
Data management during the lifecycle of AI models.
Labeling quality vs. AI performance.
LLM-based data labeling.
Data-efficient AI.

By fostering a collaborative environment, DATAI aims to inspire a diverse audience of participants from the realms of AI and data quality management, facilitating an exchange of ideas that propels the field toward groundbreaking developments.

Important Dates

More Details

Submission Deadline for Research Papers: June 01, 2025
Notification of Authors: June 20, 2025
Camera-ready Version of Accepted Papers: July 01, 2025

Paper Submission Methods

Papers must be submitted via the EasyChair conference system, accessible at the following link:Easychair Cmt .

Historical Information

The 1st International Workshop on Data-driven AI (DATAI 2024)

Contact Information

For further inquiries, please contact the chairs through the provided email addresses in the official document[View PDF].

Hongzhi Wang: wangzh@hit.edu.cn
Nan Tang: nantang@hkust-gz.edu.cn

2025Data drivenAI

DATAI2025

LondonUnited Kingdom

DATAI2025

London, United Kingdom

DATAI Workshop Schedule 2025 – September 5, 2025

Abstract

Topics of Interest

Relevant topics include, but are not limited to:

Important Dates

More Details

Paper Submission Methods

Historical Information

The 1st International Workshop on Data-driven AI (DATAI 2024)

Contact Information

2025

Data
driven
AI

London
United Kingdom