Shipeng Yu

ORCID: 0000-0002-0262-4031
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Bayesian Methods and Mixture Models
  • Artificial Intelligence in Healthcare
  • Machine Learning in Healthcare
  • Face and Expression Recognition
  • AI in cancer detection
  • Image Retrieval and Classification Techniques
  • Soil Geostatistics and Mapping
  • Machine Learning and Data Classification
  • Mobile Crowdsensing and Crowdsourcing
  • Machine Learning and Algorithms
  • Text and Document Classification Technologies
  • Soil Moisture and Remote Sensing
  • Hydrological Forecasting Using AI
  • Radiomics and Machine Learning in Medical Imaging
  • Neural Networks and Applications
  • Gaussian Processes and Bayesian Inference
  • Web Data Mining and Analysis
  • Domain Adaptation and Few-Shot Learning
  • Topic Modeling
  • Sparse and Compressive Sensing Techniques
  • Complex Network Analysis Techniques
  • Biomedical Text Mining and Ontologies
  • Auction Theory and Applications
  • Blood Pressure and Hypertension Studies
  • Advanced Image and Video Retrieval Techniques

Jiangxi University of Science and Technology
2023

LinkedIn (United States)
2016-2023

Chengdu University of Information Technology
2022

Yunnan University
2021

Siemens Healthcare (United States)
2009-2016

Beijing University of Posts and Telecommunications
2015-2016

Chinese Academy of Sciences
2012-2016

Institute of Soil Science
2012-2016

University of Toledo
2015

Siemens (Germany)
2006-2013

For many supervised learning tasks it may be infeasible (or very expensive) to obtain objective and reliable labels. Instead, we can collect subjective (possibly noisy) labels from multiple experts or annotators. In practice, there is a substantial amount of disagreement among the annotators, hence great practical interest address conventional problems in this scenario. paper describe probabilistic approach for when have annotators providing but no absolute gold standard. The proposed...

10.5555/1756006.1859894 article EN Journal of Machine Learning Research 2010-03-01

We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard. The proposed algorithm evaluates the different experts and also gives an estimate of actual hidden labels. Experimental results indicate that method is superior to commonly used majority voting baseline.

10.1145/1553374.1553488 article EN 2009-06-14

With the advent of crowdsourcing services it has become quite cheap and reasonably effective to get a data set labeled by multiple annotators in short amount time. Various methods have been proposed estimate consensus labels correcting for bias with different kinds expertise. Since we do not control over quality annotators, very often annotations can be dominated spammers, defined as who assign randomly without actually looking at instance. Spammers make cost acquiring expensive potentially...

10.5555/2188385.2188401 article EN Journal of Machine Learning Research 2012-01-01

In contrast to traditional document retrieval, a web page as whole is not good information unit search because it often contains multiple topics and lot of irrelevant from navigation, decoration, interaction part the page. this paper, we propose VIsion-based Page Segmentation (VIPS) algorithm detect semantic content structure in Compared with simple DOM based segmentation method, our scheme utilizes useful visual cues obtain better partition at level. By using VIPS assist selection query...

10.1145/775152.775155 article EN 2003-01-01

Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output (i.e. category labels) available, it often beneficial to derive not only based on inputs but also target values training data set. This of particular importance applications with multiple labels, which each document can belong several categories simultaneously. In this paper we introduce multi-label informed latent (MLSI) algorithm preserves and...

10.1145/1076034.1076080 article EN 2005-08-15

Multi-label problems arise in various domains such as multi-topic document categorization and protein function prediction. One natural way to deal with is construct a binary classifier for each label, resulting set of independent classification problems. Since the multiple labels share same input space, semantics conveyed by different are usually correlated, it essential exploit correlation information contained labels. In this paper, we consider general framework extracting shared...

10.1145/1401890.1401939 article EN 2008-08-24

Multi-label problems arise in various domains such as multi-topic document categorization, protein function prediction, and automatic image annotation. One natural way to deal with is construct a binary classifier for each label, resulting set of independent classification problems. Since multiple labels share the same input space, semantics conveyed by different are usually correlated, it essential exploit correlation information contained labels. In this paper, we consider general...

10.1145/1754428.1754431 article EN ACM Transactions on Knowledge Discovery from Data 2010-05-01

We provide an overview of the recent trends toward digitalization and large-scale data analytics in healthcare. It is expected that these are instrumental dramatic changes way healthcare will be organized future. discuss political initiatives designed to shift care delivery processes from paper electronic, with goals more effective treatments better outcomes; cost pressure a major driver innovation. describe newly developed networks providers, research organizations, commercial vendors...

10.1109/jproc.2016.2615052 article EN Proceedings of the IEEE 2016-10-19

Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance search. In this paper, we explore use page segmentation algorithms to partition into blocks investigate how take advantage block-level evidence improve retrieval in context. Because special characteristics pages, different method will have impact on search performance. We compare four types methods, including fixed-length segmentation, DOM-based vision-based a combined which...

10.1145/1008992.1009070 article EN 2004-07-25

Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of are available, e.g., a classification or regression task, PCA is however not able to use this information. The problem more interesting if only part the input labeled, i.e., semi-supervised setting. In paper we propose supervised model called SPPCA S2PPCA, both which extensions probabilistic model. proposed...

10.1145/1150402.1150454 article EN 2006-08-20

Purpose: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in medical domain. Bayesian network (BN) have a natural ability reason under uncertainty might handle missing data better. In this study, authors hypothesize that BN model two-year survival non-small cell lung (NSCLC) patients accurately SVM, will more when missing. Methods: A...

10.1118/1.3352709 article EN Medical Physics 2010-03-09

Co-training (or more generally, co-regularization) has been a popular algorithm for semi-supervised learning in data with two feature representations views), but the fundamental assumptions underlying this type of models are still unclear. In paper we propose Bayesian undirected graphical model co-training, or generally multi-view learning. This makes explicit previously unstated large class co-training algorithms, and also clarifies circumstances under which these fail. Building upon new...

10.5555/1953048.2078190 article EN Journal of Machine Learning Research 2011-02-01

Most current multi-task learning frameworks ignore the robustness issue, which means that presence of "outlier" tasks may greatly reduce overall system performance. We introduce a robust framework for Bayesian multitask learning, t-processes (TP), are generalization Gaussian processes (GP) learning. TP allows to effectively distinguish good from noisy or outlier tasks. Experiments show not only improves performance, but can also serve as an indicator "informativeness" different

10.1145/1273496.1273635 article EN 2007-06-20
Coming Soon ...