- Advanced Database Systems and Queries
- Data Management and Algorithms
- Graph Theory and Algorithms
- Data Mining Algorithms and Applications
- Cloud Computing and Resource Management
- Advanced Computational Techniques and Applications
- Web Data Mining and Analysis
- Data Quality and Management
- Advanced Text Analysis Techniques
- Text and Document Classification Technologies
- Distributed systems and fault tolerance
- Advanced Graph Neural Networks
- Service-Oriented Architecture and Web Services
- Rough Sets and Fuzzy Logic
- Parallel Computing and Optimization Techniques
- Caching and Content Delivery
- Distributed and Parallel Computing Systems
- Big Data and Business Intelligence
- Image Retrieval and Classification Techniques
- Anomaly Detection Techniques and Applications
- Topic Modeling
- Privacy-Preserving Technologies in Data
- Access Control and Trust
- Cloud Data Security Solutions
- Mathematics, Computing, and Information Processing
Northeastern University
2013-2024
Guangzhou University
2024
Academic Degrees & Graduate Education
2023
Eastern Liaoning University
2008
Northeastern University
2007-2008
Xiaomi (China)
2004
Due to the complexity of blockchain technology, it usually costs too much effort build, maintain and monitor a system that supports targeted application. To this end, emerging "Blockchain as Service" (BaaS) makes distributed ledgers more accessible, particularly for businesses, by reducing overheads. BaaS combines high computing power cloud computing, pervasiveness IoT decentralization blockchain, allowing people build their own applications while ensuring transparency openness system. This...
Billion-node graphs are rapidly growing in size many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store data and message on disk to improve scalability. Currently, these with disk-resident take push-based approach handle messages. This works well if few reside disk. Otherwise, it is I/O-inefficient due expensive random writes. By contrast, the existing...
This paper proposes a framework of change data capture and extraction, which captures changed based on the log analysis processes captured further to improve quality data. Then processed are pushed queue system using priority-based scheduling algorithm. Ultimately loaded real-time warehouse support decision analysis. After test case, this method can all coming from source in time without changing structure system, has little impact performance system. In addition, algorithm effectively...
Column-oriented stores, known for their scalability and flexibility, are a common NoSQL database implementation increasingly used in big data management. In column-oriented "full-scan" query strategy is inefficient the search space can be reduced if well partitioned or indexed; however, there no pre-defined schema building maintaining partitions indexes at lower cost. We leverage an accumulative high-dimensional model, sophisticated linearization algorithm, efficient to solve challenge of...
Many graph algorithms are iterative in nature and can be supported by distributed memory-based systems a synchronous manner. However, an asynchronous model has been recently proposed to accelerate computations. Nevertheless, it is challenging recover from failures such system, since typical checkpointing based approach requires many expensive synchronization barriers that largely offset the gains of
In recent years, the prevention and control of environmental pollution attracted much attention, haze weather directly affects people's travel health. order to effectively prevent air pollution, optimize quality evaluation system. this paper, PM <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2.5</inf> , xmlns:xlink="http://www.w3.org/1999/xlink">10</inf> SO xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> NO CO O...
Most graph algorithms are iterative in nature. They can be processed by distributed systems memory an efficient asynchronous manner. However, it is challenging to recover from failures such systems. This because traditional checkpoint fault-tolerant frameworks incur expensive barrier costs that usually offset the gains brought computations. Worse, surviving data rolled back, leading costly re-computations. paper first proposes leverage for failure recovery system. Our framework guarantees...
In a real-time data warehouses, ETL is no longer executed periodically during the idle time of but continuously ongoing. Thus triggering task, and scheduling updates queries become key issues. This paper proposes an IBSA (Integration Based Scheduling Approach), including rule algorithm for staring balancing by threads controlling. We also proposed framework implementations. A series experiments show that can adjust running order tasks reasonably use system resources effectively to provide...
In real-time data warehouses, import is no longer implemented in the batched and periodic way during idle time of but continuously ongoing. The updates warehouses are conflict with queries against warehouses. Thus scheduling becomes a key issue. This paper proposes priority-based balance algorithm (PBBS). Firstly, according to response requirements different levels being updated, gives priorities all tasks. Then it makes parallel scheduling, considering task priorities, implementation...
Regarding the existing models for feature extraction of complex similar entities, there are problems in utilization relative position information and ability key extraction. The distinctiveness Chinese named entity recognition compared to English lies absence space delimiters, significant polysemy homonymy characters, diverse common names, a greater reliance on contextual linguistic structures. An method based DeBERTa-Attention-BiLSTM-CRF (DABC) is proposed. Firstly, capability DeBERTa model...
In order to solve the problems of small capacity structured data and uneven distribution among classes in machine learning tasks, a supervised generation method for called WAGAN cyclic sampling named SACS (Semi-supervised Active-learning Cyclic Sampling), based on semi-supervised active learning, are proposed. The loss function neural network structure optimized, quantity quality sample set enhanced. To enhance reliability generating pseudo-labels, Semi-supervised Active Framework (SAF) is...
Many applications in real life can produce a large amount of data which be modeled by graph. A graph usually has millions vertices and billions edges. This paper presents BSP-based system, called BC-BSP+, to process graphs iteratively parallel. It the flexibility configure policies (i.e., disk management parameters) extend functions programming interfaces), compute large-scale graphs, tolerate faults, balance loads. Especially, three partition strategies BC-BSP+ are proposed support...
With the rapid development of Internet and World Wide Web (WWW), very large amount information is available ready for downloading, most which are free charge. At same time, hard disks with capacity at affordable prices. Most us nowadays often dump a number various types documents into our computers without much thinking. On other hand, file systems have not changed too during past decades. them organize files in directories that form tree structure, identified by its name pathname directory...
Myriad of parameter estimation algorithms can be performed by an Expectation-Maximization (EM) approach. Traditional synchronous frameworks parallelize these EM on the cloud to accelerate computation while guaranteeing convergence. However, expensive synchronization costs pose great challenges for efficiency. Asynchronous solutions have been recently designed bypass high-cost barriers but at expense potentially losing convergence guarantee.
As the wide uses of access control model in systems, a more agile is required to solve complicated modeling, user authorizing and verifying problem. In this paper, an based on concepts role, attribute context, named C-RBAC, proposed. This further improved Role-Based Access Control (RBAC). The proposed adds system conditions control, distinguishes users that belong one role by attributes, provides dynamic adopting concept conditional designs flexible authorization mechanism reinforce RBAC....