Derong Shen

ORCID: 0000-0003-0310-6372
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Quality and Management
  • Web Data Mining and Analysis
  • Topic Modeling
  • Advanced Database Systems and Queries
  • Advanced Graph Neural Networks
  • Data Management and Algorithms
  • Caching and Content Delivery
  • Semantic Web and Ontologies
  • Privacy-Preserving Technologies in Data
  • Complex Network Analysis Techniques
  • Service-Oriented Architecture and Web Services
  • Distributed and Parallel Computing Systems
  • Recommender Systems and Techniques
  • Peer-to-Peer Network Technologies
  • Data Mining Algorithms and Applications
  • Cloud Computing and Resource Management
  • Cloud Data Security Solutions
  • Text and Document Classification Technologies
  • Advanced Computational Techniques and Applications
  • Graph Theory and Algorithms
  • Image Retrieval and Classification Techniques
  • Blockchain Technology Applications and Security
  • Advanced Image and Video Retrieval Techniques
  • Web visibility and informetrics
  • Advanced Data Storage Technologies

Northeastern University
2016-2025

Universidad del Noreste
2023

Shenyang Institute of Automation
2010

Shanghai University of Electric Power
2010

Ministry of Education of the People's Republic of China
2010

Northeastern University
2005-2008

Software602 (Czechia)
2007

Neusoft (China)
2007

Institute of Software
2006

Chinese Academy of Sciences
2006

Data imbalance is a common phenomenon in machine learning. In the imbalanced data classification, minority samples are far less than majority samples, which makes it difficult for to be effectively learned by classifiers. A synthetic oversampling technique (SMOTE) improves sensitivity of classifiers synthesizing without repetition. However, process new SMOTE algorithm may lead problems such as "noisy samples" and "boundary samples." Based on above description, we propose based Gaussian...

10.1109/tnnls.2022.3197156 article EN IEEE Transactions on Neural Networks and Learning Systems 2022-08-19

With the continuous development of blockchain technology, many applications, such as digital currencies in form tokens, are deployed. However, due to lack data and value inter-blockchain transmission methods, these chain applications islands. Therefore, cross-chain technology is proposed applied distributed transaction platforms, finance, e-government other different fields. Although just started, its gradually improving meet needs system. At same time, they have already realized asset...

10.1109/access.2022.3228535 article EN cc-by IEEE Access 2022-12-12

Column semantic-type detection is a crucial task for data integration and schema matching, particularly when dealing with large volumes of unlabeled tabular data. Existing methods often rely on supervised learning models, which require extensive labeled In this paper, we propose SNMatch, an unsupervised approach based Siamese network detecting column semantic types without training The novelty SNMatch lies in its ability to generate the embeddings columns by considering both format features...

10.3390/math13040607 article EN cc-by Mathematics 2025-02-13

Abstract An accurate short-term load forecasting plays an important role in modern power system’s operation and economic development. However, is affected by multiple factors, due to the complexity of relationships between graph structure this task unknown. On other hand, existing methods do not fully aggregating data information through inherent various factors. In paper, we propose a framework based on neural networks dilated 1D-CNN, called GLFN-TC. GLFN-TC uses learning module...

10.1007/s41019-023-00233-8 article EN cc-by Data Science and Engineering 2023-11-20

PDF HTML阅读 XML下载 导出引用 引用提醒 支持大数据管理的NoSQL系统研究综述 DOI: 10.3724/SP.J.1001.2013.04416 作者: 作者单位: 作者简介: 通讯作者: 中图分类号: 基金项目: 国家重点基础研究发展计划(973)(2012CB316201); 国家自然科学基金(61033007, 61003060) Survey on NoSQL for Management of Big Data Author: Affiliation: Fund Project: 摘要 | 图/表 访问统计 参考文献 相似文献 引证文献 资源附件 文章评论 摘要:针对大数据管理的新需求,呈现出了许多面向特定应用的NoSQL 数据库系统.针对基于key-value 数据模型的NoSQL 数据库的相关研究进行综述.首先,介绍了大数据的特点以及支持大数据管理系统面临的关键技术问题;然后,介绍了相关前沿研究和研究挑战,其中典型的包括系统体系结构、数据模型、访问方式、索引技术、事务特性、系统弹性、动态负载均衡、副本策略、数据一致性策略、基于flash...

10.3724/sp.j.1001.2013.04416 article EN Journal of Software 2014-01-02

We study the problem of self-supervised and interpretable data cleaning, which automatically extracts repair rules from dirty data. In this paper, we propose a novel framework, namely Garf, based on sequence generative adversarial networks (SeqGAN). One key information Garf tries to capture is (for example, if city "Dothan", then county should be "Houston"). employs SeqGAN consisting generator G discriminator D that trains learn dependency relationships ( e.g. , given value "Dothan" as...

10.14778/3570690.3570694 article EN Proceedings of the VLDB Endowment 2022-11-01

Collaborative Filtering has achieved great success in capturing users' preferences over items. However, existing techniques only consider limited collaborative signals, leading to unsatisfactory results when the user-item interactions are sparse. In this paper, we propose a Cross-grained Neural model (CNCF), which enables recommendation more accurate and explainable. Specifically, first construct four kinds of interaction graphs both fine-grained signals coarse-grained can better compensate...

10.1109/access.2024.3384376 article EN cc-by-nc-nd IEEE Access 2024-01-01

Finding a team that is both competent in performing the task and compatible working together has been extensively studied. However, most methods for formation tend to rely on set of skills only. In order solve this problem, we present an efficient method based Constrained Pattern Graph (called CPG). Unlike traditional methods, our takes into account structure constraints communication members, which can better meet requirements users. First, CPG preprocessing proposed normalize represent it...

10.1109/icde48307.2020.00082 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

In this paper, we focus on conceptual data modeling and logical database of fuzzy information. The IFO model is extended with set theory to cope imperfect as well complex objects in the real world at a level. Concerning information, object-oriented model. Some major notions databases such objects, classes, objects-classes relationships, supertype/subtype, multiple inheritances are under information environment, generic for developed. particular, develop formal approach mapping IF_{2}O...

10.5555/2656168.2656173 article EN Journal of Intelligent & Fuzzy Systems 2006-11-01

Many challenging problems could be better solved by exploiting crowdsourcing platforms than traditional machine-based methods. However, data quality in applications has become a crucial aspect since workers may have different capabilities. In this paper, we propose novel weighted aggregation rule (WAR) to improve the result accuracy systems. According agreement of answers given workers, classify all tasks into high-agreement and low-agreement tasks. For tasks, use simple majority voting...

10.1109/dasc.2014.54 article EN 2014-08-01

Abstract Cross-modal similarity query has become a highlighted research topic for managing multimodal datasets such as images and texts. Existing researches generally focus on accuracy by designing complex deep neural network models hardly consider efficiency interpretability simultaneously, which are vital properties of cross-modal semantic processing system large-scale datasets. In this work, we investigate multi-grained common embedding representations texts integrate interpretable index...

10.1007/s41019-021-00162-4 article EN cc-by Data Science and Engineering 2021-05-31

MapReduce has been proven to be a highly desirable platform for scalable parallel data analysis. The task scheduling in is very crucial the job execution and marked impact on system performance. To best of our knowledge, previous algorithms rarely consider job-intensive environments are not able provide high throughput. Hence this paper proposes novel technique improve Firstly, by making an in-depth analysis environments, we sum up 4 major factors which affect Secondly, based factors,...

10.1109/bigdata.congress.2013.36 article EN 2013-06-01

Nowadays, people usually participate in multiple social networks simultaneously, e.g., Facebook and Twitter. Formally, the correspondences of accounts that belong to same user are defined as anchor links, aligned by links can be denoted networks. In this paper, we study problem link prediction (ALP) across a pair based on network structure. First, three similarity metrics (CPS, CCS, CPS+) proposed. Different from previous works, focus theoretical guarantees our metrics. We prove...

10.1109/access.2018.2814000 article EN cc-by-nc-nd IEEE Access 2018-01-01
Coming Soon ...