- Geological Modeling and Analysis
- Anomaly Detection Techniques and Applications
- Human Mobility and Location-Based Analysis
- Data Stream Mining Techniques
- Land Use and Ecosystem Services
- Climate change and permafrost
- demographic modeling and climate adaptation
- Network Security and Intrusion Detection
- Machine Learning and Data Classification
- Geophysics and Gravity Measurements
- 3D Modeling in Geospatial Applications
- Remote Sensing and Land Use
- Text and Document Classification Technologies
- Topic Modeling
- Advanced Graph Neural Networks
- Computational Physics and Python Applications
- Opportunistic and Delay-Tolerant Networks
- Climate variability and models
- Impact of Light on Environment and Health
- Soil Carbon and Nitrogen Dynamics
- Distributed and Parallel Computing Systems
- Urban Heat Island Mitigation
- Fluoride Effects and Removal
- Environmental and Agricultural Sciences
- Climate Change and Health Impacts
Inner Mongolia Agricultural University
2019-2025
China University of Geosciences
2019-2024
Pacific Northwest National Laboratory
2024
Norwegian Institute for Nature Research
2024
Nanjing University
2024
University of Illinois Urbana-Champaign
2007-2024
University of Delaware
2019-2024
University of Twente
2024
Purdue University West Lafayette
2021-2024
Northwest Normal University
2024
Abstract Urban land expansion is one of the most visible, irreversible, and rapid types cover/land use change in contemporary human history, a key driver for many environmental societal changes across scales. Yet spatial projections how much where it may occur are often limited to short-term futures small geographic areas. Here we produce first empirically-grounded set global, urban over 21st century. We data-science approach exploiting 15 diverse datasets, including newly available 40-year...
Most existing data stream classification techniques ignore one important aspect of data: arrival a novel class. We address this issue and propose technique that integrates class detection mechanism into traditional classifiers, enabling automatic classes before the true labels instances arrive. Novel problem becomes more challenging in presence concept-drift, when underlying distributions evolve streams. In order to determine whether an instance belongs class, model sometimes needs wait for...
The effectiveness of knowledge transfer using classification algorithms depends on the difference between distribution that generates training examples and one from which test are to be drawn. task can especially difficult when or several domains different domain. In this paper, we propose a locally weighted ensemble framework combine multiple models for learning, where weights dynamically assigned according model's predictive power each example. It integrate advantages various learning...
In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such assume relatively balanced and stable streams but cannot handle well rather skewed (e.g., few positives lots of negatives) stochastic distributions, which are typical many stream applications. this paper, we propose a new approach to mine by estimating reliable posterior probabilities using an ensemble models match the distribution over under-samples negatives repeated samples...
Linked or networked data are ubiquitous in many applications. Examples include web hypertext documents connected via hyperlinks, social networks user profiles friend links, co-authorship and citation information, blog data, movie reviews so on. In these datasets (called "information networks"), closely related objects that share the same properties interests form a community. For example, community blogsphere could be users mostly interested cell phone news. Outlier detection information can...
Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds online data. In this paper, we propose a novel topic modeling framework for document which builds unified generative model that is able consider both structure information documents. A graphical proposed describe model. On top layer model, define multivariate Markov random field distribution variables each document, dependency...
Recent approaches in classifying evolving data streams are based on supervised learning algorithms, which can be trained with labeled only. Manual labeling of is both costly and time consuming. Therefore, a real streaming environment, where huge volumes appear at high speed, may very scarce. Thus, only limited amount training available for building the classification models, leading to poorly classifiers. We apply novel technique overcome this problem by model from set having unlabeled small...
Recent years have witnessed an increasing number of studies in stream mining, which aim at building accurate model for continuously arriving data. Somehow most existing work makes the implicit assumption that training data and yet-to-come testing are always sampled from "same distribution", yet this distribution" evolves over time. We demonstrate may not be true, one actually never know either "how" or "when" distribution changes. Thus, a fits well on observed can unsatisfactory accuracy...
The problem of data stream classification is challenging because many practical aspects associated with efficient processing and temporal behavior the stream. Two such well studied are infinite length concept-drift. Since a may be considered continuous process, which theoretically in length, it impractical to store use all historical for training. Data streams also frequently experience concept-drift as result changes underlying concepts. However, another important characteristic streams,...
Data stream classification poses many challenges to the data mining community. In this paper, we address four such major challenges, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Since a is theoretically in it impractical store use all historical for training. Concept-drift common phenomenon streams, which occurs as result of changes underlying concepts. Concept-evolution new classes evolving stream. Feature-evolution frequently occurring process text...
The Global Human Settlement Layer (GHSL) project fosters an enhanced, public understanding of the human presence on Earth. A decade after its inception in Digital Earth 2020 vision, GHSL is established European Commission's Joint Research Centre and integral part Copernicus Emergency Management Service. 2023 edition, a result rigorous research Observation data population censuses, contributes significantly to worldwide settlements. It introduces new elements like 10-m-resolution, sub-pixel...
Improvements in high-resolution satellite remote sensing and computational advancements have sped up the development of global datasets that delineate urban land, crucial for understanding climate risks our increasingly urbanizing world. Here, we analyze land cover patterns across spatiotemporal scales from several such current-generation products. While all show a rapidly world, with nearly tripling between 1985 2015, there are substantial discrepancies area estimates among products...
Current outlier detection schemes typically output a numeric score representing the degree to which given observation is an outlier. We argue that converting scores into well-calibrated probability estimates more favorable for several reasons. First, allow us select appropriate threshold declaring outliers using Bayesian risk model. Second, obtained from individual models can be aggregated build ensemble framework. In this paper, we present two methods transforming probabilities. The first...
Classification is an important data analysis tool that uses a model built from historical to predict class labels for new observations. More and more applications are featuring streams, rather than finite stored sets, which challenge traditional classification algorithms. Concept drifts skewed distributions, two common properties of stream applications, make the task learning in streams difficult. The authors aim develop approach classify ensemble models match distribution over under-samples...
Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique exponential but many are non-discriminative and correlated. Currently, performed in two sequential steps: enumerating a set patterns, followed by selection. Although methods been proposed past few years on how perform each separate step efficiently, there still limited success eventually finding highly compact...