- Granular flow and fluidized beds
- Industrial Engineering and Technologies
- Privacy-Preserving Technologies in Data
- Topic Modeling
- Coal Combustion and Slurry Processing
- Engineering and Environmental Studies
- Mobile Crowdsensing and Crowdsourcing
- Recommender Systems and Techniques
- Software Engineering Research
- Natural Language Processing Techniques
- Semantic Web and Ontologies
Renmin University of China
2024
Language models have shown promising performance on the task of translating natural language questions into SQL queries (Text-to-SQL). However, most state-of-the-art (SOTA) approaches rely powerful yet closed-source large (LLMs), such as ChatGPT and GPT-4, which may limitations unclear model architectures, data privacy risks, expensive inference overheads. To address limitations, we introduce CodeS, a series pre-trained with parameters ranging from 1B to 15B, specifically designed for...
Direct Preference Optimization (DPO) has proven effective in complex reasoning tasks like math word problems and code generation. However, when applied to Text-to-SQL datasets, it often fails improve performance can even degrade it. Our investigation reveals the root cause: unlike tasks, which naturally integrate Chain-of-Thought (CoT) with DPO, datasets typically include only final answers (gold SQL queries) without detailed CoT solutions. By augmenting synthetic solutions, we achieve, for...
Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it known to be time and effort consuming. Despite the impressive capabilities of large language models like ChatGPT in generating programs by inter- acting with users through natural prompts, there are still limitations. Specifically, user must provide specific prompts iteratively guide improving programs, which requires certain level expertise programming, dataset used ML task....