NFDI4DS | UHH-SEMS - Publication Details

Variational Deep Semantic Hashing for Text Documents

OPENALEX - Publications

Suthee Chaidaroon Yi Fang

As the amount of textual data has been rapidly increasing over past decade, efficient similarity search methods have become a crucial component large-scale information retrieval systems. A popular strategy is to represent original samples by compact binary codes through hashing. spectrum machine learning utilized, but they often lack expressiveness and flexibility in modeling learn effective representations. The recent advances deep wide range applications demonstrated its capability robust...

10.1145/3077136.3080816 preprint EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017-07-28

Deep Semantic Text Hashing with Weak Supervision

OPENALEX - Publications

Suthee Chaidaroon Travis Ebesu Yi Fang

With an ever increasing amount of data available on the web, fast similarity search has become critical component for large-scale information retrieval systems. One solution is semantic hashing which designs binary codes to accelerate search. Recently, deep learning been successfully applied problem and produces high-quality compact compared traditional methods. However, most state-of-the-art approaches require large amounts hand-labeled training are often expensive time consuming collect....

10.1145/3209978.3210090 article EN 2018-06-27

Semantic Retrieval at Walmart

OPENALEX - Publications

Alessandro Magnani Feng Liu Suthee Chaidaroon Sachin Yadav Praveen Reddy Suram and 6 more

In product search, the retrieval of candidate products before re-ranking is more mission critical and challenging than other search like web especially for tail queries, which have a complex specific intent. this paper, we present hybrid system e-commerce deployed at Walmart that combines traditional inverted index embedding-based neural to better answer user queries. Our significantly improved relevance engine, measured by both offline online evaluations. The improvements were achieved...

10.1145/3534678.3539164 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

A Multi-task Learning Framework for Product Ranking with BERT

OPENALEX - Publications

Xuyang Wu Alessandro Magnani Suthee Chaidaroon Ajit Puthenputhussery Ciya Liao and 1 more

Product ranking is a crucial component for many e-commerce services. One of the major challenges in product search vocabulary mismatch between query and products, which may be larger gap problem compared to other information retrieval domains. While there growing collection neural learning match methods aimed specifically at overcoming this issue, they do not leverage recent advances large language models search. On hand, often deals with multiple types engagement signals such as clicks,...

10.1145/3485447.3511977 article EN Proceedings of the ACM Web Conference 2022 2022-04-25

Neural Compatibility Ranking for Text-based Fashion Matching

OPENALEX - Publications

Suthee Chaidaroon Yi Fang Min Xie Alessandro Magnani

When shopping for fashion, customers often look products which can complement their current outfit. For example, want to buy a jacket go well with jeans and sneakers. To address the task of fashion matching, we propose neural compatibility model ranking based on matching input The contribution our work is twofold. First, demonstrate that product descriptions contain rich information about comparability has not been fully utilized in prior work. Secondly, exploit such useful from text data by...

10.1145/3331184.3331365 article EN 2019-07-18

node2hash: Graph aware deep semantic text hashing

OPENALEX - Publications

Suthee Chaidaroon Dae Hoon Park Yi Chang Yi Fang

10.1016/j.ipm.2019.102143 article EN Information Processing & Management 2019-11-02

Variational Deep Semantic Hashing for Text Documents

OPENALEX - Publications

Suthee Chaidaroon Yi Fang

As the amount of textual data has been rapidly increasing over past decade, efficient similarity search methods have become a crucial component large-scale information retrieval systems. A popular strategy is to represent original samples by compact binary codes through hashing. spectrum machine learning utilized, but they often lack expressiveness and flexibility in modeling learn effective representations. The recent advances deep wide range applications demonstrated its capability robust...

10.48550/arxiv.1708.03436 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Semantic Retrieval at Walmart

OPENALEX - Publications

Alessandro Magnani Feng Liu Suthee Chaidaroon Sachin Yadav Praveen Reddy Suram and 6 more

In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web especially for tail queries, which have a complex specific intent. this paper, we present hybrid system e-commerce deployed at Walmart that combines traditional inverted index embedding-based neural to better answer user queries. Our significantly improved relevance engine, measured by both offline online evaluations. The improvements were achieved through...

10.48550/arxiv.2412.04637 preprint EN arXiv (Cornell University) 2024-12-05

Constrained Decoding with Speculative Lookaheads

OPENALEX - Publications

Nishanth Nakshatri Shamik Roy Raj Das Suthee Chaidaroon Leonid Boytsov and 1 more

Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive roll-out operations each generated token makes CDLH prohibitively expensive, resulting in low adoption practice. In contrast, common strategies such as greedy are extremely efficient, but achieve very constraint satisfaction. We propose constrained speculative lookaheads (CDSL), technique that significantly improves upon inference...

10.48550/arxiv.2412.10418 preprint EN arXiv (Cornell University) 2024-12-09

Improving Programming Q&A with Neural Generative Augmentation

OPENALEX - Publications

Suthee Chaidaroon Xiao Zhang Shruti Subramaniyam Jeffrey Svajlenko Tanya Shourya and 2 more

Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers quickly finding answers from the vast amount of information on Internet. In this study, we propose ProQANS and its variants ReProQANS ReAugProQANS to tackle Q&A. a neural search approach that leverages unlabeled data Internet (such as StackOverflow) mitigate cold-start problem. extends utilizing reformulated queries with novel triplet loss. We further...

10.1145/3539618.3591860 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18