- Topic Modeling
- Natural Language Processing Techniques
- Mobile Crowdsensing and Crowdsourcing
- Biomedical Text Mining and Ontologies
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Advanced Graph Neural Networks
- Web Data Mining and Analysis
- Data Quality and Management
- Text and Document Classification Technologies
- Expert finding and Q&A systems
- Privacy-Preserving Technologies in Data
- Data Stream Mining Techniques
- Sentiment Analysis and Opinion Mining
- Speech and dialogue systems
- Recommender Systems and Techniques
- Gene expression and cancer classification
- Software Engineering Research
- Bioinformatics and Genomic Networks
- Electronic Health Records Systems
- Advanced Computational Techniques and Applications
- Domain Adaptation and Few-Shot Learning
- Forest Biomass Utilization and Management
- Software Reliability and Analysis Research
- Complex Network Analysis Techniques
Tsinghua University
2013-2024
Nanning Normal University
2024
Qiqihar Medical University
2024
Zhongyuan University of Technology
2024
Shanghai CASB Biotechnology (China)
2024
Iowa State University
2013-2023
Zhejiang University-University of Edinburgh Institute
2022-2023
Shaoxing University
2020-2023
Shandong Normal University
2022-2023
Xijing University
2023
We present an incremental joint framework to simultaneously extract entity mentions and relations using structured perceptron with efficient beam-search. A segment-based decoder based on the idea of semi-Markov chain is adopted new as opposed traditional token-based tagging. In addition, by virtue inexact search, we developed a number effective global features soft constraints capture interdependency among relations. Experiments Automatic Content Extraction (ACE) 1 corpora demonstrate that...
In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how identify piece of information is trustworthy, i.e., truth discovery task. Intuitively, if from reliable source, then it more and source that provides trustworthy reliable. Based on this principle, approaches have been proposed infer reliability degrees most (i.e., truth) simultaneously. However, existing...
In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on same set questions. The most important challenge for this task is to estimate source reliability and select that are high-quality sources. Existing work solves problem simultaneously estimating sources' inferring questions' true (i.e., truths). However, these methods assume a has degree all questions, but ignore fact may vary significantly among different topics. To capture...
In the era of big data, information regarding same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among coming different To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy by estimating reliability each source, has emerged as a hot topic. many real world applications, however, may come sequentially, and consequence, well sources dynamically evolving. Existing discovery methods, unfortunately, cannot handle...
We leverage crowd wisdom for multiple-choice question answering, and employ lightweight machine learning techniques to improve the aggregation accuracy of crowdsourced answers these questions. In order develop more effective methods evaluate them empirically, we developed deployed a system playing “Who wants be millionaire?” quiz show. Analyzing our data (which consist than 200,000 answers), find that by just going with most selected answer in aggregation, can over 90% questions correctly,...
In this paper, we propose a new framework that unifies the output of three information extraction (IE) tasks - entity mentions, relations and events as an network representation, extracts all them using one single joint model based on structured prediction. This novel formulation allows different parts fully interact with each other. For example, many can now be considered resultant states events. Our approach achieves substantial improvements over traditional pipelined approaches,...
In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As result, this will inevitably lead to data information conflicts. One important problem is identify true (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">truths</i> ) among conflicting sources data. It intuitive trust reliable more when deriving truths, but it usually unknown which xmlns:xlink="http://www.w3.org/1999/xlink">a...
With the proliferation of sensor-rich mobile devices, crowd sensing has emerged as a new paradigm collecting information from physical world. However, sensory data provided by participating workers are usually not reliable. In order to identify truthful values data, topic truth discovery, whose goal is estimate each worker's reliability and infer underlying truths through weighted aggregation, widely studied. Since discovery incorporates workers' into aggregation procedure, it shows...
Large language models (LLMs) exhibited powerful capability in various natural processing tasks. This work focuses on exploring LLM performance zero-shot information extraction, with a focus the ChatGPT and named entity recognition (NER) task. Inspired by remarkable reasoning of symbolic arithmetic reasoning, we adapt prevalent methods to NER propose strategies tailored for NER. First, explore decomposed question-answering paradigm breaking down task into simpler subproblems labels. Second,...
Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, same object, there usually exist conflicts among multi-source information. To tackle this challenge, truth discovery, which integrates noisy by estimating reliability each source, has emerged as a hot topic. Several discovery methods have been proposed various scenarios, and they successfully applied in diverse application domains. In survey, we focus on providing...
As an effective way to solicit useful information from the crowd, crowdsourcing has emerged as a popular paradigm solve challenging tasks. However, data provided by participating workers are not always trustworthy. In real world, there may exist malicious in systems who conduct poisoning attacks for purpose of sabotage or financial rewards. Although aggregation methods such majority voting conducted on workers» labels order improve quality, they vulnerable treat all equally. capture variety...
Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, Qi Li. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
This study uses deep-learning models to predict city partition crime counts on specific days. It helps police enhance surveillance, gather intelligence, and proactively prevent crimes. We formulate count prediction as a spatiotemporal sequence challenge, where both input data targets are sequences. In order improve the accuracy of forecasting, we introduce new model that combines Convolutional Neural Networks (CNN) Long Short-Term Memory (LSTM) networks. conducted comparative analysis access...
Genotype imputation is a critical preprocessing step in genome-wide association studies (GWAS), enhancing statistical power for detecting associated single nucleotide polymorphisms (SNPs) by increasing marker size. In response to the needs of researchers seeking user-friendly graphical tools without requiring informatics or computer expertise, we have developed weIMPUTE, web-based user interface (GUI). Unlike existing genotype software, weIMPUTE supports multiple including SHAPEIT, Eagle,...
Influenza-like illness (ILI) continues to present significant challenges global health, highlighting the need for accurate forecasting guide timely public health responses. Traditional statistical and deep learning models, though widely applied, often face difficulties in capturing complex nonlinear dynamics addressing data scarcity. This study examines potential of fine-tuned large language models (LLMs), including Llama2 GPT2, multi-step influenza forecasting. A specialized fine-tuning...
Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose novel approaches jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between second is based on a joint CRFs model decode sentence pairs, bilingual factors word alignment. Experiments Chinese-English...
In this paper, we present GDA, a generalized decision aggregation framework that integrates information from distributed sensor nodes for making in resource efficient manner. Traditional approaches target similar problems only take as input the discrete label individual sensors observe same events. Different them, our proposed GDA is able to advantage of confidence each about its decision, and thus achieves higher accuracy. Targeting problem domains, can naturally handle scenarios where...
Ofer Bronstein, Ido Dagan, Qi Li, Heng Ji, Anette Frank. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
Applications of multiphoton processes in lanthanide-doped nanophosphors (NPs) are often limited by relatively weak and narrow absorbance. Here, the concept an ultimate photosensitization aggregation-induced enhanced emission (AIEE) dyes to overcome this limitation is introduced. Because AIEE do not suffer from concentration quenching, they can fully cover NP surface at high density maximize absorbance while passivating surface. This applied down-conversion quantum cutting. Specifically,...