NFDI4DS | UHH-SEMS - Publication Details

Shin Matsushima

ORCID: 0000-0002-8160-4310

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102724770

Research Areas

Face and Expression Recognition
Machine Learning and ELM
Advanced Bandit Algorithms Research
Stochastic Gradient Optimization Techniques
Sparse and Compressive Sensing Techniques
Bayesian Modeling and Causal Inference
Neural Networks and Applications
Natural Language Processing Techniques
Traffic Prediction and Management Techniques
Data Management and Algorithms
Machine Learning and Algorithms
Machine Learning and Data Classification
Topic Modeling
Data Mining Algorithms and Applications
Human Mobility and Location-Based Analysis
Rough Sets and Fuzzy Logic
Fault Detection and Control Systems
Domain Adaptation and Few-Shot Learning
Text and Document Classification Technologies
Imbalanced Data Classification Techniques
Anomaly Detection Techniques and Applications
Cloud Data Security Solutions
Recommender Systems and Techniques
Bayesian Methods and Mixture Models
Markov Chains and Monte Carlo Methods

The University of Tokyo
2011-2022

Tokyo University of the Arts
2020

Tokyo University of Information Sciences
2018

Purdue University West Lafayette
2016

University of California, Santa Cruz
2016

Intel (United Kingdom)
2016

WordRank: Learning Word Embeddings via Robust Ranking

OPENALEX - Publications

Shihao Ji Hyokun Yun Pinar Yanardag Shin Matsushima S. V. N. Vishwanathan

Embedding words in a vector space has gained lot of attention recent years.While stateof-the-art methods provide efficient computation word similarities via low-dimensional matrix embedding, their motivation is often left unclear.In this paper, we argue that embedding can be naturally viewed as ranking problem due to the nature evaluation metrics.Then, based on insight, propose novel framework Wor-dRank efficiently estimates representations robust ranking, which mechanism and robustness...

10.18653/v1/d16-1063 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2016-01-01

Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class

OPENALEX - Publications

Shin Matsushima Nobuyuki Shimizu Kazuhiro YOSHIDA Takashi Ninomiya Hiroshi Nakagawa

Previous chapter Next Full AccessProceedings Proceedings of the 2010 SIAM International Conference on Data Mining (SDM)Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support ClassShin Matsushima, Nobuyuki Shimizu, Kazuhiro Yoshida, Takashi Ninomiya, and Hiroshi NakagawaShin Nakagawapp.303 - 314Chapter DOI:https://doi.org/10.1137/1.9781611972801.27PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract The Passive Aggressive...

10.1137/1.9781611972801.27 article EN 2010-04-29

Linear support vector machines via dual cached loops

OPENALEX - Publications

Shin Matsushima S. V. N. Vishwanathan Alexander J. Smola

Modern computer hardware offers an elaborate hierarchy of storage subsystems with different speeds, capacities, and costs associated them. Furthermore, processors are now inherently parallel offering the execution several diverse threads simultaneously. This paper proposes StreamSVM, first algorithm for training linear Support Vector Machines (SVMs) which takes advantage these properties by integrating caching optimization. StreamSVM works performing updates in dual, thus obviating need to...

10.1145/2339530.2339559 article EN 2012-08-12

Traffic Risk Mining From Heterogeneous Road Statistics

OPENALEX - Publications

Koichi Moriya Shin Matsushima Kenji Yamanishi

At present, a large amount of traffic-related data is obtained manually and through sensors social media, e.g., traffic statistics, accident road information, users' comments. In this paper, we propose novel framework for mining risk from such heterogeneous data. Traffic refers to the possibility occurrence accidents. Specifically, focus on two issues: 1) predicting number accidents any or at intersection 2) clustering roads identify factors risky clusters. We present unified approach...

10.1109/tits.2018.2856533 article EN IEEE Transactions on Intelligent Transportation Systems 2018-09-13

Scaling Multinomial Logistic Regression via Hybrid Parallelism

OPENALEX - Publications

Parameswaran Raman Sriram Srinivasan Shin Matsushima Xinhua Zhang Hyokun Yun and 1 more

We study the problem of scaling Multinomial Logistic Regression (MLR) to datasets with very large number data points in presence classes. At a scale where neither nor parameters are able fit on single machine, we argue that simultaneous and model parallelism (Hybrid Parallelism) is inevitable. The key challenge achieving such form MLR log-partition function which needs be computed across all K classes per point, thus making non-trivial. To overcome this problem, propose reformulation...

10.1145/3292500.3330837 article EN 2019-07-25

Distributed Stochastic Optimization of the Regularized Risk

OPENALEX - Publications

Shin Matsushima Hyokun Yun Xinhua Zhang S. V. N. Vishwanathan

Many machine learning algorithms minimize a regularized risk, and stochastic optimization is widely used for this task. When working with massive data, it desirable to perform in parallel. Unfortunately, many existing cannot be parallelized efficiently. In paper we show that one can rewrite the risk minimization problem as an equivalent saddle-point problem, propose efficient distributed (DSO) algorithm. We prove algorithm's rate of convergence; remarkably, our analysis shows algorithm...

10.48550/arxiv.1406.4363 preprint EN other-oa arXiv (Cornell University) 2014-01-01

Totally Corrective Boosting with Cardinality Penalization

OPENALEX - Publications

Vasil S. Denchev Nan Ding Shin Matsushima S. V. N. Vishwanathan Hartmut Neven

We propose a totally corrective boosting algorithm with explicit cardinality regularization. The resulting combinatorial optimization problems are not known to be efficiently solvable existing classical methods, but emerging quantum technology gives hope for achieving sparser models in practice. In order demonstrate the utility of our algorithm, we use distributed heuristic optimizer as stand-in hardware. Even though this evaluation methodology incurs large time and resource costs on...

10.48550/arxiv.1504.01446 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Traffic risk mining from heterogeneous road statistics

OPENALEX - Publications

Koichi Moriya Shin Matsushima Kenji Yamanishi

Lately, a large amount of traffic-related data, such as traffic statistics, accident road information, and drivers' pedestrians' comments, has been collected through sensors social media networks. In this paper, we propose novel framework for mining risk from heterogeneous data. Traffic refers to the possibility accidents occurring. We specifically focus on two issues: 1) predicting number any intersection 2) clustering roads identify factors that are common risky clusters. followed unifying...

10.1109/dsaa.2015.7344889 article EN 2015-10-01

Traffic Risk Mining Using Partially Ordered Non-Negative Matrix Factorization

OPENALEX - Publications

Taito Lee Shin Matsushima Kenji Yamanishi

A large amount of traffic-related data, including traffic statistics, accident road information, and drivers' pedestrians' comments, is being collected through sensors social media networks. We focus on the issue extracting risk factors from such heterogeneous data ranking locations according to extracted factors. In general, it difficult define risk. may adopt a clustering approach identify groups risky locations, where factor by comparing groups. Furthermore, we utilize prior knowledge...

10.1109/dsaa.2016.71 article EN 2016-10-01

Web Behavior Analysis Using Sparse Non-Negative Matrix Factorization

OPENALEX - Publications

Demachi Akihiro Shin Matsushima Kenji Yamanishi

We are concerned with the issue of discovering behavioral patterns on web. When a large amount web access logs given, we interested in how they categorized and related to activities real life. In order conduct that analysis, develop novel algorithm for sparse non-negative matrix factorization (SNMF), which can discover behaviors. Although there exist number variants SNMFs, our is it updates parameters multiplicative way performance guaranteed, thereby works more robustly than existing ones,...

10.1109/dsaa.2016.85 article EN 2016-10-01

Feature-aware regularization for sparse online learning

OPENALEX - Publications

Hidekazu Oiwa Shin Matsushima Hiroshi Nakagawa

10.1007/s11432-014-5082-z article EN Science China Information Sciences 2014-04-21

WordRank: Learning Word Embeddings via Robust Ranking

OPENALEX - Publications

Shihao Ji Hyokun Yun Pinar Yanardag Shin Matsushima S. V. N. Vishwanathan

Embedding words in a vector space has gained lot of attention recent years. While state-of-the-art methods provide efficient computation word similarities via low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that embedding can be naturally viewed as ranking problem due to the nature evaluation metrics. Then, based on insight, propose novel framework WordRank efficiently estimates representations robust ranking, which mechanism and robustness...

10.48550/arxiv.1506.02761 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Healing Truncation Bias: Self-Weighted Truncation Framework for Dual Averaging

OPENALEX - Publications

Hidekazu Oiwa Shin Matsushima Hiroshi Nakagawa

We propose a new truncation framework for online supervised learning. Learning compact predictive model in an setting has recently attracted great deal of attention. The combination learning with sparsity-inducing regularization enables faster smaller memory space than conventional framework. However, simple these triggers the weights whose corresponding features rarely appear, even if are crucial prediction. Furthermore, it is difficult to emphasize advance while preserving advantages...

10.1109/icdm.2012.33 article EN 2012-12-01

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

OPENALEX - Publications

Parameswaran Raman Sriram Srinivasan Shin Matsushima Xinhua Zhang Hyokun Yun and 1 more

Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This primarily because one needs compute the log-partition function on every point. makes distributing computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up problems massive scale without hitting any storage constraints model parameters. Our algorithm exploits double-separability, an attractive...

10.48550/arxiv.1604.04706 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Detection of Unobserved Common Causes based on NML Code in Discrete, Mixed, and Continuous Variables

OPENALEX - Publications

Masatoshi Kobayashi Kohei Miyagichi Shin Matsushima

Causal discovery in the presence of unobserved common causes from observational data only is a crucial but challenging problem. We categorize all possible causal relationships between two random variables into following four categories and aim to identify one observed data: cases which either direct causality exists, case that are independent, confounded by latent confounders. Although existing methods have been proposed tackle this problem, they require satisfy assumptions on form their...

10.48550/arxiv.2403.06499 preprint EN arXiv (Cornell University) 2024-03-11

Model Selection for Non-Negative Tensor Factorization with Minimum Description Length

OPENALEX - Publications

Yunhui Fu Shin Matsushima Kenji Yamanishi

Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes high-order non-negative data into several factor matrices. In NTF, the rank has to be predetermined specify model and it greatly influences factorized However, its value conventionally determined by specialists' insights or trial error. This paper proposes novel selection criterion for NTF on basis of minimum description length (MDL) principle. Our methodology unique in (1) we apply MDL...

10.3390/e21070632 article EN cc-by Entropy 2019-06-27

Grafting for combinatorial binary model using frequent itemset mining

OPENALEX - Publications

Taito Lee Shin Matsushima Kenji Yamanishi

Abstract We consider the class of linear predictors over all logical conjunctions binary attributes, which we refer to as combinatorial models (CBMs) in this paper. CBMs are high knowledge interpretability but naïve learning them from labeled data requires exponentially computational cost with respect length conjunctions. On other hand, case large-scale datasets, long effective for predictors. To overcome difficulty, propose an algorithm, GRAfting Binary datasets (GRAB) , efficiently learns...

10.1007/s10618-019-00657-9 article EN cc-by Data Mining and Knowledge Discovery 2019-10-28

Statistical Learnability of Generalized Additive Models based on Total Variation Regularization

OPENALEX - Publications

Shin Matsushima

A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric by the sum of univariate functions with respect to each explanatory variable, i.e., $f({\mathbf x}) = \sum f_j(x_j)$, where $x_j\in\mathbb{R}$ $j$-th component sample ${\mathbf x}\in \mathbb{R}^p$. In this paper, we introduce total variation (TV) function as measure complexity in $L^1_{\rm c}(\mathbb{R})$-space. Our analysis shows that GAM based on TV-regularization exhibits Rademacher $O(\sqrt{\frac{\log...

10.48550/arxiv.1802.03001 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Coming Soon ...