- Sparse and Compressive Sensing Techniques
- Face and Expression Recognition
- Text and Document Classification Technologies
- Stochastic Gradient Optimization Techniques
- Complex Network Analysis Techniques
- Machine Learning and Algorithms
- Advanced Clustering Algorithms Research
- Advanced Graph Neural Networks
- Topic Modeling
- Matrix Theory and Algorithms
- Machine Learning and Data Classification
- Domain Adaptation and Few-Shot Learning
- Advanced Optimization Algorithms Research
- Machine Learning and ELM
- Bayesian Methods and Mixture Models
- Gene expression and cancer classification
- Data Management and Algorithms
- Statistical Methods and Inference
- Tensor decomposition and applications
- Neural Networks and Applications
- Natural Language Processing Techniques
- Blind Source Separation Techniques
- Anomaly Detection Techniques and Applications
- Bioinformatics and Genomic Networks
- Recommender Systems and Techniques
Amazon (United States)
2019-2024
Google (United States)
2023-2024
The University of Texas at Austin
2014-2023
Amazon (Germany)
2017-2022
Search
2022
National Taiwan University
2014
Max Planck Society
2010
IBM Research - Almaden
1998-2000
University of California, Berkeley
1996-1997
University of Tennessee at Knoxville
1996
In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing differential relative entropy between two multivariate Gaussians under constraints on express particular Bregman optimization problem---that LogDet divergence subject linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle wide variety and optionally incorporate prior Second, it...
Both document clustering and word are well studied problems. Most existing algorithms cluster documents words separately but not simultaneously. In this paper we present the novel idea of modeling collection as a bipartite graph between words, using which simultaneous problem can be posed partitioning problem. To solve problem, use new spectral co-clustering algorithm that uses second left right singular vectors an appropriately scaled word-document matrix to yield good bipartitionings. The...
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose analyze parametric hard soft clustering algorithms based on a large class known as Bregman divergences. The proposed unify centroid-based approaches, such classical kmeans information-theoretic which arise by special choices the divergence. maintain simplicity scalability algorithm, while generalizing basic idea to very loss...
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods remained only loosely related. In this paper, we give an explicit theoretical connection between them. We show the generality of weighted kernel objective function, derive normalized cut as a special case. Given positive definite similarity matrix, our results lead novel algorithm monotonically decreases cut. This has...
Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem table analysis is co-clustering: simultaneous clustering of the rows columns. novel theoretical formulation views an empirical joint probability distribution two discrete random variables poses co-clustering optimization information theory---the optimal maximizes mutual between clustered subject to constraints on number row...
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral and kernel k-means are two the main methods. In this paper, we discuss an equivalence between objective functions used in these seemingly different methods--in particular, a general weighted mathematically equivalent graph objective. We exploit develop fast, high-quality multilevel algorithm directly optimizes various objectives, such as popular ratio cut, normalized...
Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional that is also inherently directional in nature. Often L2 normalized so it lies on the surface of a unit hypersphere. Popular models (mixtures of) multi-variate Gaussians are inadequate for characterizing data. This paper proposes generative mixture-model approach to clustering based von Mises-Fisher (vMF) distribution, which arises naturally distributed In...
High dimensionality of text can be a deterrent in applying complex learners such as Support Vector Machines to the task classification. Feature clustering is powerful alternative feature selection for reducing data. In this paper we propose new information-theoretic divisive algorithm feature/word and apply it Existing techniques distributional words are agglomerative nature result (i) sub-optimal word clusters (ii) high computational cost. order explicitly capture optimality an information...
Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only matrix...
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding exact minimum adversarial distortion hard, giving certified lower bound possible. Current available methods computing such are either time-consuming or delivering low quality bounds that too loose to be useful. In this paper, we exploit special structure ReLU networks provide two computationally efficient...
In graph-based learning models, entities are often represented as vertices in an undirected graph with weighted edges describing the relationships between entities. many real-world applications, however, associated relations of different types and/or from sources, which can be well captured by multiple graphs over same set vertices. How to exploit such sources information make better inferences on remains interesting open problem. this paper, we focus problem clustering based both...
Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms applicability. More often than not, the evidence available diseases varies-for example, we may know linked genes, keywords associated with obtained by mining text, or co-occurrence symptoms patients. Similarly, microarray probes convey information only certain sets genes. In this article, apply a novel matrix-completion method called Inductive Matrix Completion to...
Co-clustering, or simultaneous clustering of rows and columns a two-dimensional data matrix, is rapidly becoming powerful analysis technique. Co-clustering has enjoyed wide success in varied application domains such as text clustering, gene-microarray analysis, natural language processing image, speech video analysis. In this paper, we introduce partitional co-clustering formulation that driven by the search for good matrix approximation---every associated with an approximation original...
Matrix factorization, when the matrix has missing values, become one of leading techniques for recommender systems. To handle web-scale datasets with millions users and billions ratings, scalability becomes an important issue. Alternating Least Squares (ALS) Stochastic Gradient Descent (SGD) are two popular approaches to compute factorization. There been a recent flurry activity parallelize these algorithms. However, due cubic time complexity in target rank, ALS is not scalable large-scale...