- Face and Expression Recognition
- Machine Learning and ELM
- Advanced Bandit Algorithms Research
- Stochastic Gradient Optimization Techniques
- Sparse and Compressive Sensing Techniques
- Bayesian Modeling and Causal Inference
- Neural Networks and Applications
- Natural Language Processing Techniques
- Traffic Prediction and Management Techniques
- Data Management and Algorithms
- Machine Learning and Algorithms
- Machine Learning and Data Classification
- Topic Modeling
- Data Mining Algorithms and Applications
- Human Mobility and Location-Based Analysis
- Rough Sets and Fuzzy Logic
- Fault Detection and Control Systems
- Domain Adaptation and Few-Shot Learning
- Text and Document Classification Technologies
- Imbalanced Data Classification Techniques
- Anomaly Detection Techniques and Applications
- Cloud Data Security Solutions
- Recommender Systems and Techniques
- Bayesian Methods and Mixture Models
- Markov Chains and Monte Carlo Methods
The University of Tokyo
2011-2022
Tokyo University of the Arts
2020
Tokyo University of Information Sciences
2018
Purdue University West Lafayette
2016
University of California, Santa Cruz
2016
Intel (United Kingdom)
2016
Embedding words in a vector space has gained lot of attention recent years.While stateof-the-art methods provide efficient computation word similarities via low-dimensional matrix embedding, their motivation is often left unclear.In this paper, we argue that embedding can be naturally viewed as ranking problem due to the nature evaluation metrics.Then, based on insight, propose novel framework Wor-dRank efficiently estimates representations robust ranking, which mechanism and robustness...
Previous chapter Next Full AccessProceedings Proceedings of the 2010 SIAM International Conference on Data Mining (SDM)Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support ClassShin Matsushima, Nobuyuki Shimizu, Kazuhiro Yoshida, Takashi Ninomiya, and Hiroshi NakagawaShin Nakagawapp.303 - 314Chapter DOI:https://doi.org/10.1137/1.9781611972801.27PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract The Passive Aggressive...
Modern computer hardware offers an elaborate hierarchy of storage subsystems with different speeds, capacities, and costs associated them. Furthermore, processors are now inherently parallel offering the execution several diverse threads simultaneously. This paper proposes StreamSVM, first algorithm for training linear Support Vector Machines (SVMs) which takes advantage these properties by integrating caching optimization. StreamSVM works performing updates in dual, thus obviating need to...
At present, a large amount of traffic-related data is obtained manually and through sensors social media, e.g., traffic statistics, accident road information, users' comments. In this paper, we propose novel framework for mining risk from such heterogeneous data. Traffic refers to the possibility occurrence accidents. Specifically, focus on two issues: 1) predicting number accidents any or at intersection 2) clustering roads identify factors risky clusters. We present unified approach...
We study the problem of scaling Multinomial Logistic Regression (MLR) to datasets with very large number data points in presence classes. At a scale where neither nor parameters are able fit on single machine, we argue that simultaneous and model parallelism (Hybrid Parallelism) is inevitable. The key challenge achieving such form MLR log-partition function which needs be computed across all K classes per point, thus making non-trivial. To overcome this problem, propose reformulation...
Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it desirable to perform in parallel. Unfortunately, many existing cannot be parallelized efficiently. In paper we show that one can rewrite the risk minimization problem as an equivalent saddle-point problem, propose efficient distributed (DSO) algorithm. We prove algorithm's rate of convergence; remarkably, our analysis shows algorithm...
We propose a totally corrective boosting algorithm with explicit cardinality regularization. The resulting combinatorial optimization problems are not known to be efficiently solvable existing classical methods, but emerging quantum technology gives hope for achieving sparser models in practice. In order demonstrate the utility of our algorithm, we use distributed heuristic optimizer as stand-in hardware. Even though this evaluation methodology incurs large time and resource costs on...
Lately, a large amount of traffic-related data, such as traffic statistics, accident road information, and drivers' pedestrians' comments, has been collected through sensors social media networks. In this paper, we propose novel framework for mining risk from heterogeneous data. Traffic refers to the possibility accidents occurring. We specifically focus on two issues: 1) predicting number any intersection 2) clustering roads identify factors that are common risky clusters. followed unifying...
A large amount of traffic-related data, including traffic statistics, accident road information, and drivers' pedestrians' comments, is being collected through sensors social media networks. We focus on the issue extracting risk factors from such heterogeneous data ranking locations according to extracted factors. In general, it difficult define risk. may adopt a clustering approach identify groups risky locations, where factor by comparing groups. Furthermore, we utilize prior knowledge...
We are concerned with the issue of discovering behavioral patterns on web. When a large amount web access logs given, we interested in how they categorized and related to activities real life. In order conduct that analysis, develop novel algorithm for sparse non-negative matrix factorization (SNMF), which can discover behaviors. Although there exist number variants SNMFs, our is it updates parameters multiplicative way performance guaranteed, thereby works more robustly than existing ones,...
Embedding words in a vector space has gained lot of attention recent years. While state-of-the-art methods provide efficient computation word similarities via low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that embedding can be naturally viewed as ranking problem due to the nature evaluation metrics. Then, based on insight, propose novel framework WordRank efficiently estimates representations robust ranking, which mechanism and robustness...
We propose a new truncation framework for online supervised learning. Learning compact predictive model in an setting has recently attracted great deal of attention. The combination learning with sparsity-inducing regularization enables faster smaller memory space than conventional framework. However, simple these triggers the weights whose corresponding features rarely appear, even if are crucial prediction. Furthermore, it is difficult to emphasize advance while preserving advantages...
Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This primarily because one needs compute the log-partition function on every point. makes distributing computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up problems massive scale without hitting any storage constraints model parameters. Our algorithm exploits double-separability, an attractive...
Causal discovery in the presence of unobserved common causes from observational data only is a crucial but challenging problem. We categorize all possible causal relationships between two random variables into following four categories and aim to identify one observed data: cases which either direct causality exists, case that are independent, confounded by latent confounders. Although existing methods have been proposed tackle this problem, they require satisfy assumptions on form their...
Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes high-order non-negative data into several factor matrices. In NTF, the rank has to be predetermined specify model and it greatly influences factorized However, its value conventionally determined by specialists' insights or trial error. This paper proposes novel selection criterion for NTF on basis of minimum description length (MDL) principle. Our methodology unique in (1) we apply MDL...
Abstract We consider the class of linear predictors over all logical conjunctions binary attributes, which we refer to as combinatorial models (CBMs) in this paper. CBMs are high knowledge interpretability but naïve learning them from labeled data requires exponentially computational cost with respect length conjunctions. On other hand, case large-scale datasets, long effective for predictors. To overcome difficulty, propose an algorithm, GRAfting Binary datasets (GRAB) , efficiently learns...
A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric by the sum of univariate functions with respect to each explanatory variable, i.e., $f({\mathbf x}) = \sum f_j(x_j)$, where $x_j\in\mathbb{R}$ $j$-th component sample ${\mathbf x}\in \mathbb{R}^p$. In this paper, we introduce total variation (TV) function as measure complexity in $L^1_{\rm c}(\mathbb{R})$-space. Our analysis shows that GAM based on TV-regularization exhibits Rademacher $O(\sqrt{\frac{\log...