- Privacy-Preserving Technologies in Data
- Topic Modeling
- Video Surveillance and Tracking Methods
- Web Data Mining and Analysis
- Text and Document Classification Technologies
- Advanced Computational Techniques and Applications
- Information Retrieval and Search Behavior
- Privacy, Security, and Data Protection
- Image Enhancement Techniques
- Scientific Computing and Data Management
- Advanced Algorithms and Applications
- Spam and Phishing Detection
- Data Management and Algorithms
- Consumer Retail Behavior Studies
- Natural Language Processing Techniques
- Mobile Crowdsensing and Crowdsourcing
- Data Quality and Management
- Consumer Behavior in Brand Consumption and Identification
- Advanced Graph Neural Networks
- Cloud Computing and Resource Management
- Image and Signal Denoising Methods
- Cryptography and Data Security
- Geographic Information Systems Studies
- Neural Networks and Applications
- Machine Learning and ELM
Anhui University of Technology
2011-2024
Nanjing University of Aeronautics and Astronautics
2023-2024
Second Affiliated Hospital of Chengdu University of Traditional Chinese
2023
National University of Defense Technology
2020
Hangzhou Dianzi University
2020
Minzu University of China
2020
Southeast University
2020
Soochow University
2019
Monash University
2018
Tongji University
2018
Empirical studies of information retrieval methods show that good performance is closely related to the use various heuristics, such as TF-IDF weighting. One basic research question thus what exactly are these necessary heuristics seem cause performance. In this paper, we present a formal study heuristics. We formally define set desirable constraints any reasonable function should satisfy, and check on variety representative functions. find none functions satisfies all unconditionally....
In most existing retrieval models, documents are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document and lengths. Intuitively, the proximity matched query terms in a can also be exploited to promote scores which close each other. Such heuristic, however, has been largely under-explored literature; it is unclear how we model incorporate measure into an model. this paper,we systematically explore heuristic. Specifically, propose...
Fake news detection (FND) is a critical task in natural language processing (NLP) focused on identifying and mitigating the spread of misinformation. Large models (LLMs) have recently shown remarkable abilities understanding semantics performing logical inference. However, their tendency to generate hallucinations poses significant challenges accurately detecting deceptive content, leading suboptimal performance. In addition, existing FND methods often underutilize extensive prior knowledge...
Language model information retrieval depends on accurate estimation of document models. In this paper, we propose a expansion technique to deal with the problem insufficient sampling documents. We construct probabilistic neighborhood for each document, and expand its information. The expanded provides more model, thus improves accuracy. Moreover, since pseudo feedback exploit different corpus structures, they can be combined further improve performance. experiment results several data sets...
In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal some of same topics --- and therefore share references to named entities but are not translations each other. We present distinct methods for transliteration, one approach phonetic second temporal distribution candidate pairs. Each these approaches works quite well, by combining can achieve even better results. then propose a novel score propagation method...
In the recent years, Web has been rapidly "deepened" with prevalence of databases online. On this deep Web, many sources are <i>structured</i> by providing structured query interfaces and results. Organizing such into a domain hierarchy is one critical steps toward integration heterogeneous sources. We observe that, for sources, schemas <i>ie</i>, attributes in interfaces) discriminative representatives thus can be exploited source characterization. particular, viewing as type categorical...
Studies reported that if teachers can accurately predict students’ follow-up learning effects via data mining and other means, as per their current performances, explore the difficulty level of mastery future-related courses in advance, it will help improve scores future exams. Although educational analytics have experienced an increase exploration use, they are still difficult to precisely define. The usage deep methods academic performances recommend optimal has not received considerable...
Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as bilingual dictionary or examples of direct translations text. In this paper, we propose general cross-lingual text mining method does not rely on any these resources, but can exploit comparable corpora to discover mappings between words and documents different languages. Comparable are collections about similar topics; naturally available (e.g., news...
Web spam can significantly deteriorate the quality of search engines. Early web spamming techniques mainly manipulate page content. Since linkage information is widely used in search, link-based has also developed. So far, many have been proposed to detect link spam. Those approaches are basically built on ranking methods.
Numerous extant image dehazing methods based on learning improve performance by increasing the depth or width, size of convolution kernel, using Transformer structure. However, this will inevitably introduce many parameters and increase computational overhead. Therefore, we propose a lightweight framework: Dehaze-UNet, which has excellent very low overhead to be suitable for terminal deployment. To allow Dehaze-UNet aggregate features haze, design LAYER module. This module mainly aggregates...
Haze obscures remote sensing images, making it difficult to extract valuable information. To address this problem, we propose a fine detail extraction network that aims restore image details and improve quality. Specifically, capture details, design multi-scale multi-dimensional blocks then fuse them optimize feature extraction. The block adopts pixel attention channel combine global local information from the image. Meanwhile, uses depthwise separable convolutional layers additional...
The differential privacy histogram publishing method based on grouping cannot balance the reconstruction error and Laplace noise error, resulting in insufficient accuracy. To address this problem, we propose a symmetric DPHR (differential released). Firstly, algorithm uses exponential mechanism to sort counting of original bucket globally improve accuracy; secondly, an optimal dynamic programming global minimum which as evaluation function ordered histogram. This way, can achieve while...
Smart home devices generate a substantial amount of local data, and finding effective ways to utilize this data while ensuring privacy has become an increasingly pressing concern. Technologies such as Homes, Federated Learning Blockchain offer promising solutions address challenge. We introduce blockchain-based federated learning approach that leverages edge nodes maintain decentralized blockchain, thus mitigating the risks associated with single points failure. Furthermore, method utilizes...
Face hallucination aims to produce a high‐resolution face image from an input low‐resolution image, which is of great importance for many practical applications, such as recognition and verification. Since the structure complex sensitive, obtaining super‐resolved more difficult than generic super‐resolution. Recently, with success in high‐level task, deep learning methods, especially generative adversarial networks (GANs), have also been applied low‐level vision task – hallucination. This...
In recent years, the amount of available data is growing exponentially, and large-scale becoming ubiquitous. Machine learning a key to deriving insight from this deluge data. paper, we focus on analysis, especially classification data, propose an online conjugate gradient (CG) descent algorithm. Our algorithm draws improved Fletcher-Reeves (IFR) CG method proposed in Jiang Jian[13] as well approach reduce variance for stochastic Johnson Zhang [15]. theory, prove that achieves linear...
The supervised learning-based recommendation models, whose infrastructures are sufficient training samples with high quality, have been widely applied in many domains. In the era of big data explosive growth volume, should be labelled timely and accurately to guarantee excellent performance models. Machine annotation cannot complete tasks labelling quality because limited machine intelligence. Although expert can achieve a accuracy, it requires long time as well more resources. As new way...