NFDI4DS | UHH-SEMS - Publication Details

Hanbing Liu

ORCID: 0009-0000-4582-3340

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5057511308

Research Areas

Granular flow and fluidized beds
Industrial Engineering and Technologies
Privacy-Preserving Technologies in Data
Topic Modeling
Coal Combustion and Slurry Processing
Engineering and Environmental Studies
Mobile Crowdsensing and Crowdsourcing
Recommender Systems and Techniques
Software Engineering Research
Natural Language Processing Techniques
Semantic Web and Ontologies

Renmin University of China
2024

CodeS: Towards Building Open-source Language Models for Text-to-SQL

OPENALEX - Publications

Haoyang Li Jing Zhang Hanbing Liu Ju Fan Xiaokang Zhang and 5 more

Language models have shown promising performance on the task of translating natural language questions into SQL queries (Text-to-SQL). However, most state-of-the-art (SOTA) approaches rely powerful yet closed-source large (LLMs), such as ChatGPT and GPT-4, which may limitations unclear model architectures, data privacy risks, expensive inference overheads. To address limitations, we introduce CodeS, a series pre-trained with parameters ranging from 1B to 15B, specifically designed for...

10.1145/3654930 article EN Proceedings of the ACM on Management of Data 2024-05-29

Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

OPENALEX - Publications

Hanbing Liu Haoyang Li Xiaokang Zhang Rong-Shi Chen Haiyong Xu and 3 more

Direct Preference Optimization (DPO) has proven effective in complex reasoning tasks like math word problems and code generation. However, when applied to Text-to-SQL datasets, it often fails improve performance can even degrade it. Our investigation reveals the root cause: unlike tasks, which naturally integrate Chain-of-Thought (CoT) with DPO, datasets typically include only final answers (gold SQL queries) without detailed CoT solutions. By augmenting synthetic solutions, we achieve, for...

10.48550/arxiv.2502.11656 preprint EN arXiv (Cornell University) 2025-02-17

ChatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions

OPENALEX - Publications

Sibei Chen Hanbing Liu Waiting Jin Xiangyu Sun Xiaoyao Feng and 3 more

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by inter- acting with users through natural prompts, there are still limitations. Specifically, user must provide specific prompts iteratively guide improving programs, which requires certain level expertise programming, dataset used ML task....

10.1145/3626246.3654727 article EN 2024-05-23

Coming Soon ...