NFDI4DS | UHH-SEMS - Publication Details

Fei Tan

ORCID: 0000-0002-3232-1912

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100740041

Research Areas

Topic Modeling
Natural Language Processing Techniques
Hate Speech and Cyberbullying Detection
Recommender Systems and Techniques
Housing Market and Economics
Spam and Phishing Detection
Multimodal Machine Learning Applications
Customer churn and segmentation
Data Mining Algorithms and Applications
Advanced Malware Detection Techniques
Domain Adaptation and Few-Shot Learning
Human Mobility and Location-Based Analysis
Data Quality and Management
Data Stream Mining Techniques
Text Readability and Simplification
Software Engineering Research
Geochemistry and Geologic Mapping
FinTech, Crowdfunding, Digital Finance
Web Data Mining and Analysis
Microfinance and Financial Inclusion
Biomedical Text Mining and Ontologies
Machine Learning in Healthcare
Adversarial Robustness in Machine Learning
Contact Mechanics and Variational Inequalities
Music and Audio Processing

Group Sense (China)
2023-2024

Yahoo (United States)
2019-2022

New Jersey Institute of Technology
2015-2020

Worcester Polytechnic Institute
2020

Yahoo (Spain)
2020

Twitter (United States)
2020

Institute of Rock and Soil Mechanics
2017

University of Delaware
2014-2015

Distinct microbiological signatures associated with triple negative breast cancer

OPENALEX - Publications

Sagarika Banerjee Zhi Wei Fei Tan Kristen N. Peck Natalie Shih and 4 more

Abstract Infectious agents are the third highest human cancer risk factor and may have a greater role in origin and/or progression of cancers related pathogenesis. Thus, knowing specific viruses microbial associated with type provide insights into cause, diagnosis treatment. We utilized pan-pathogen array technology to identify signatures triple negative breast (TNBC). This detects low copy number fragmented genomes extracted from formalin-fixed paraffin embedded archival tissues. The...

10.1038/srep15162 article EN cc-by Scientific Reports 2015-10-15

A Deep Learning Approach to Competing Risks Representation in Peer-to-Peer Lending

OPENALEX - Publications

Fei Tan Xiurui Hou Jie Zhang Zhi Wei Zhen Yan

Online peer-to-peer (P2P) lending is expected to benefit both investors and borrowers due their low transaction cost the elimination of expensive intermediaries. From lenders' perspective, maximizing return on investment an ultimate goal during decision-making procedure. In this paper, we explore address a fundamental problem underlying such goal: how represent two competing risks, charge-off prepayment, in funded loans. We propose model potential risks simultaneously, which remains largely...

10.1109/tnnls.2018.2870573 article EN IEEE Transactions on Neural Networks and Learning Systems 2018-10-10

Success Prediction on Crowdfunding with Multimodal Deep Learning

OPENALEX - Publications

Chaoran Cheng Fei Tan Xiurui Hou Zhi Wei

We consider the problem of project success prediction on crowdfunding platforms. Despite information in a profile can be different modalities such as text, images, and metadata, most existing approaches leverage only text dominated modality. Nowadays rich visual images have been utilized more profiles for attracting backers, little work has conducted to evaluate their effects towards prediction. Moreover, meta exploited many improving accuracy. However, is usually limited dynamics after...

10.24963/ijcai.2019/299 article EN 2019-07-28

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

OPENALEX - Publications

Thanh Tran Yifan Hu Changwei Hu Kevin Yen Fei Tan and 2 more

We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of BERT model, we propose several modifications to enhance performance on downstream classification task. inherits BERT's architecture, but is different four aspects: (i) it generates its own vocabularies and pre-trained from scratch using largest dataset; (ii) consists Quaternion-based factorized components, resulting a much smaller number parameters, faster training...

10.18653/v1/2020.emnlp-main.606 article EN cc-by 2020-01-01

Time-Aware Latent Hierarchical Model for Predicting House Prices

OPENALEX - Publications

Fei Tan Chaoran Cheng Zhi Wei

It is widely acknowledged that the value of a house mixture large number characteristics. House price prediction thus presents unique set challenges in practice. While body works are dedicated to this task, their performance and applications have been limited by shortage long time span transaction data, absence real-world settings insufficiency housing features. To end, time-aware latent hierarchical model introduced capture underlying spatiotemporal interactions behind evolution prices. The...

10.1109/icdm.2017.147 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2017-11-01

PUnifiedNER: A Prompting-Based Unified NER System for Diverse Datasets

OPENALEX - Publications

Jinghui Lu Rui Zhao Brian Mac Namee Fei Tan

Much of named entity recognition (NER) research focuses on developing dataset-specific models based data from the domain interest, and a limited set related types. This is frustrating as each new dataset requires model to be trained stored. In this work, we present ``versatile'' model---the Prompting-based Unified NER system (PUnifiedNER)---that works with different domains can recognise up 37 types simultaneously, theoretically it could many possible. By using prompt learning, PUnifiedNER...

10.1609/aaai.v37i11.26564 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

OPENALEX - Publications

Bang An Jie Lyu Zhenyi Wang Chunyuan Li Changwei Hu and 4 more

Bang An, Jie Lyu, Zhenyi Wang, Chunyuan Li, Changwei Hu, Fei Tan, Ruiyi Zhang, Yifan Changyou Chen. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

10.18653/v1/2020.emnlp-main.17 article EN cc-by 2020-01-01

A Blended Deep Learning Approach for Predicting User Intended Actions

OPENALEX - Publications

Fei Tan Zhi Wei Jun He Xiang Wu Bo Peng and 2 more

User intended actions are widely seen in many areas. Forecasting these and taking proactive measures to optimize business outcome is a crucial step towards sustaining the steady growth. In this work, we focus on predicting attrition, which one of typical user actions. Conventional attrition predictive modeling strategies suffer few inherent drawbacks. To overcome limitations, propose novel end-to-end learning scheme keep track evolution patterns for modeling. It integrates activity logs,...

10.1109/icdm.2018.00064 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

TNT: Text Normalization based Pre-training of Transformers for Content Moderation

OPENALEX - Publications

Fei Tan Yifan Hu Changwei Hu Keqian Li Kevin Yen

In this work, we present a new language pre-training model TNT (Text Normalization based of Transformers) for content moderation. Inspired by the masking strategy and text normalization, is developed to learn representation training transformers reconstruct from four operation types typically seen in manipulation: substitution, transposition, deletion, insertion. Furthermore, normalization involves prediction both token labels, enabling more challenging tasks than standard task masked word...

10.18653/v1/2020.emnlp-main.383 article EN cc-by 2020-01-01

What Makes Pre-trained Language Models Better Zero-shot Learners?

OPENALEX - Publications

Jinghui Lu Dongsheng Zhu Weidong Han Rui Zhao Brian Mac Namee and 1 more

Current methods for prompt learning in zero-shot scenarios widely rely on a development set with sufficient human-annotated data to select the best-performing template posteriori. This is not ideal because real-world scenario of practical relevance, no labelled available. Thus, we propose simple yet effective method screening reasonable templates text classification: Perplexity Selection (Perplection). We hypothesize that language discrepancy can be used measure efficacy templates, and...

10.18653/v1/2023.acl-long.128 article EN cc-by 2023-01-01

Elucidation of DNA methylation on N6-adenine with deep learning

OPENALEX - Publications

Fei Tan Tian Tian Xiurui Hou Xiang Yu Lei Gu and 4 more

10.1038/s42256-020-0211-4 article EN Nature Machine Intelligence 2020-08-03

Improvement of contact calculation in spherical discontinuous deformation analysis

OPENALEX - Publications

Long Wang Yu‐Yong Jiao Gang‐Hai Huang Fei Zheng Zhiye Zhao and 1 more

10.1007/s11431-016-0203-6 article EN Science China Technological Sciences 2017-01-05

Modeling Real Estate for School District Identification

OPENALEX - Publications

Fei Tan Chaoran Cheng Zhi Wei

The affiliated school district of a real estate property is often crucial concern. How to automate the identification residential homes located in favorable educational environment, however, largely unexplored until now. availability heterogeneous estate-related data offers great opportunity for this task. Nevertheless, it such heterogeneity that poses significant challenges their amalgamation unified fashion. To end, we develop G-LRMM model integrate digital price, textual comments, and...

10.1109/icdm.2016.0164 article EN 2016-12-01

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

OPENALEX - Publications

Bang An Jie Lyu Zhenyi Wang Chunyuan Li Changwei Hu and 4 more

The neural attention mechanism plays an important role in many natural language processing applications. In particular, the use of multi-head extends single-head by allowing a model to jointly attend information from different perspectives. Without explicit constraining, however, may suffer collapse, issue that makes heads extract similar attentive features, thus limiting model's representation power. this paper, for first time, we provide novel understanding Bayesian perspective. Based on...

10.48550/arxiv.2009.09364 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Modeling and elucidation of housing price

OPENALEX - Publications

Fei Tan Chaoran Cheng Zhi Wei

10.1007/s10618-018-00612-0 article EN Data Mining and Knowledge Discovery 2019-02-08

What Makes Pre-trained Language Models Better Zero-shot Learners?

OPENALEX - Publications

Jinghui Lu Rui Zhao Brian Mac Namee Dongsheng Zhu Weidong Han and 1 more

Current methods for prompt learning in zeroshot scenarios widely rely on a development set with sufficient human-annotated data to select the best-performing template posteriori. This is not ideal because realworld zero-shot scenario of practical relevance, no labelled available. Thus, we propose simple yet effective method screening reasonable templates text classification: Perplexity Selection (Perplection). We hypothesize that language discrepancy can be used measure efficacy templates,...

10.48550/arxiv.2209.15206 preprint EN public-domain arXiv (Cornell University) 2022-01-01

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

OPENALEX - Publications

Hengyuan Zhang Yanru Wu Dawei Li Zacc Yang Rui Zhao and 2 more

Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice gain often leads catastrophic forgetting (CF) previously acquired hindering the model's performance across In response this challenge, we propose CoFiTune, coarse fine framework an attempt strike balance between speciality...

10.48550/arxiv.2404.10306 preprint EN arXiv (Cornell University) 2024-04-16

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

OPENALEX - Publications

Hengyuan Zhang Yanru Wu Dawei Li Sak Yang Rui Zhao and 2 more

10.18653/v1/2024.findings-acl.445 article EN Findings of the Association for Computational Linguistics: ACL 2022 2024-01-01

SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding

OPENALEX - Publications

Chuanghao Ding Xuejing Liu Wei Tang Juan Li Xiaoliang Wang and 3 more

10.1145/3688866.3689125 article EN 2024-10-26

CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models

OPENALEX - Publications

Jiawei Gu Zacc Yang Chuanghao Ding Rui Zhao Fei Tan

10.18653/v1/2024.emnlp-main.903 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

DeepVar: An End-to-End Deep Learning Approach for Genomic Variant Recognition in Biomedical Literature

OPENALEX - Publications

Chaoran Cheng Fei Tan Zhi Wei

We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically genomic variants recognition in this work. Significant success has been achieved for NER canonical tasks recent years where large data sets are generally available. However, it remains a challenging many domain-specific areas, especially domains only small gold annotations can be obtained. In addition, variant entities exhibit diverse linguistic heterogeneity, differing much...

10.1609/aaai.v34i01.5399 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

User Response Driven Content Understanding with Causal Inference

OPENALEX - Publications

Fei Tan Zhi Wei Abhishek Pani Zhen Yan

Content understanding with many potential industrial applications, is spurring interest by researchers in areas artificial intelligence. We propose to revisit the content problem digital marketing from three novel perspectives. First, our explore way how user experience delivered divergent key multimedia elements. Second, we treat as elucidate their causal implications driving responses. Third, understand based on observational audience visit logs. To approach this problem, measure and...

10.1109/icdm.2019.00168 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2019-11-01

CWSeg: An Efficient and General Approach to Chinese Word Segmentation

OPENALEX - Publications

Dedong Li Rui Zhao Fei Tan

In this work, we report our efforts in advancing Chinese Word Segmentation for the purpose of rapid deployment different applications. The pre-trained language model (PLM) based segmentation methods have achieved state-of-the-art (SOTA) performance, whereas paradigm also poses challenges deployment. It includes balance between performance and cost, ambiguity due to domain diversity vague words boundary, multi-grained segmentation. context, propose a simple yet effective approach, namely...

10.18653/v1/2023.acl-industry.1 article EN cc-by 2023-01-01

Coming Soon ...