- Complex Network Analysis Techniques
- Advanced Graph Neural Networks
- Imbalanced Data Classification Techniques
- Topic Modeling
- Opinion Dynamics and Social Influence
- Anomaly Detection Techniques and Applications
- Machine Learning and Data Classification
- Bioinformatics and Genomic Networks
- Data Mining Algorithms and Applications
- Data Stream Mining Techniques
- Recommender Systems and Techniques
- Mental Health Research Topics
- Human Mobility and Location-Based Analysis
- Computational Drug Discovery Methods
- Text and Document Classification Technologies
- Time Series Analysis and Forecasting
- Mobile Health and mHealth Applications
- Network Security and Intrusion Detection
- Artificial Intelligence in Healthcare
- Machine Learning in Materials Science
- Advanced Text Analysis Techniques
- Data Visualization and Analytics
- Natural Language Processing Techniques
- Technology Use by Older Adults
- Biomedical Text Mining and Ontologies
University of Notre Dame
2016-2025
American Society For Engineering Education
2024
Purdue University West Lafayette
2024
History of Science Society
2022-2024
Hefei University of Technology
2022
Hebrew University of Jerusalem
2022
Hong Kong Polytechnic University
2022
Zhejiang Lab
2022
Beijing University of Posts and Telecommunications
2022
Los Alamitos Medical Center
2022
An approach to the construction of classifiers from imbalanced datasets is described. A dataset if classification categories are not approximately equally represented. Often real-world data sets predominately composed "normal" examples with only a small percentage "abnormal" or "interesting" examples. It also case that cost misclassifying an abnormal (interesting) example as normal often much higher than reverse error. Under-sampling majority (normal) class has been proposed good means...
We study the problem of representation learning in heterogeneous networks. Its unique challenges come from existence multiple types nodes and links, which limit feasibility conventional network embedding techniques. develop two scalable models, namely metapath2vec metapath2vec++. The model formalizes meta-path-based random walks to construct neighborhood a node then leverages skip-gram perform embeddings. metapath2vec++ further enables simultaneous modeling structural semantic correlations...
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This due to its simplicity design procedure, as well robustness when applied different type problems. Since publication 2002, SMOTE has proven successful a variety applications several domains. also inspired approaches counter issue class imbalance, and significantly contributed new supervised paradigms, including multilabel...
Representation learning in heterogeneous graphs aims to pursue a meaningful vector representation for each node so as facilitate downstream applications such link prediction, personalized recommendation, classification, etc. This task, however, is challenging not only because of the demand incorporate structural (graph) information consisting multiple types nodes and edges, but also due need considering attributes or contents (e.g., text image) associated with node. Despite substantial...
This paper examines important factors for link prediction in networks and provides a general, high-performance framework the task. Link sparse presents significant challenge due to inherent disproportion of links that can form do form. Previous research has typically approached this as an unsupervised problem. While is not first work explore supervised learning, many influencing guiding classification remain unexplored. In paper, we consider these by motivating use through careful...
Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis refer to identifying abnormal status certain steps pinpointing the root causes. Building such a system, however, is challenging since it not only requires capture temporal dependency each series, but also need encode inter-correlations between different pairs of series. In addition, system should be robust noise provide...
Social status, defined as the relative rank or position that an individual holds in a social hierarchy, is known to be among most important motivating forces behaviors. In this paper, we consider notion of status from perspective title held by person enterprise. We study intersection and networks whether enterprise communication logs can help reveal how interactions manifest themselves networks. To end, use two datasets with three channels --- voice call, short message, email demonstrate...
"Big Data" as a term has been among the biggest trends of last three years, leading to an upsurge research, well industry and government applications. Data is deemed powerful raw material that can impact multidisciplinary research endeavors business performance. The goal this discussion paper share data analytics opinions perspectives authors relating new opportunities challenges brought forth by big movement. bring together diverse perspectives, coming from different geographical locations...
Link prediction and recommendation is a fundamental problem in social network analysis. The key challenge of link comes from the sparsity networks due to strong disproportion links that they have potential form do form. Most previous work tries solve single network, few research focus on capturing general principles formation across heterogeneous networks. In this work, we give formal definition Then propose ranking factor graph model (RFG) for predicting networks, which effectively improves...
Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. Modern advances in deep have further magnified the importance problem, especially when from images. Therefore, there need an oversampling method that specifically tailored to models, can work on raw images while preserving their properties, and capable generating high-quality, artificial enhance minority classes balance training set. We propose Deep...
Link prediction, i.e., predicting links or interactions between objects in a network, is an important task network analysis. Although the problem has attracted much attention recently, there are several challenges that have not been addressed so far. First, most existing studies focus only on link prediction homogeneous networks, where all and belong to same type. However, real world, heterogeneous networks consist of multi-typed relationships ubiquitous. Second, current concern whether will...
Demographics are widely used in marketing to characterize different types of customers. However, practice, demographic information such as age, gender, and location is usually unavailable due privacy other reasons. In this paper, we aim harness the power big data automatically infer users' demographics based on their daily mobile communication patterns. Our study a real-world large network more than 7,000,000 users over 1,000,000,000 records (CALL SMS). We discover several interesting social...
Big Data applications are emerging during the last years, and researchers from many disciplines aware of high advantages related to knowledge extraction this type problem. However, traditional learning approaches cannot be directly applied due scalability issues. To overcome issue, MapReduce framework has arisen as a "de facto" solution. Basically, it carries out "divide-and-conquer" distributed procedure in fault-tolerant way adapt for commodity hardware. Being still recent discipline, few...
To ensure the correctness of network analysis methods, (as input) has to be a sufficiently accurate representation underlying data. However, when representing sequential data from complex systems such as global shipping traffic or web clickstream networks, conventional representations that implicitly assume Markov property (first-order dependency) can quickly become limiting. This assumption holds movements are simulated on network, next movement depends only current node, discounting fact...