John Lafferty

ORCID: 0000-0002-5929-220X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Statistical Methods and Inference
  • Natural Language Processing Techniques
  • Bayesian Modeling and Causal Inference
  • Topic Modeling
  • Bayesian Methods and Mixture Models
  • Neural Networks and Applications
  • Machine Learning and Algorithms
  • Sparse and Compressive Sensing Techniques
  • Hemoglobinopathies and Related Disorders
  • Algorithms and Data Compression
  • Advanced Statistical Methods and Models
  • Face and Expression Recognition
  • Text and Document Classification Technologies
  • Information Retrieval and Search Behavior
  • Machine Learning and Data Classification
  • Distributed Sensor Networks and Detection Algorithms
  • Iron Metabolism and Disorders
  • Statistical Methods and Bayesian Inference
  • Stochastic Gradient Optimization Techniques
  • Gene expression and cancer classification
  • Blood groups and transfusion
  • Advanced Text Analysis Techniques
  • Speech Recognition and Synthesis
  • Markov Chains and Monte Carlo Methods
  • Gaussian Processes and Bayesian Inference

Yale University
2015-2024

Carnegie Mellon University
2007-2020

Johns Hopkins University
2012-2020

University of Chicago
2012-2017

New Mexico Institute of Mining and Technology
2016

Amazon (United States)
2016

University of Pennsylvania
2016

University of Illinois Chicago
2015

Princeton University
2007-2014

Stanford University
2014

A family of probabilistic time series models is developed to analyze the evolution topics in large document collections. The approach use state space on natural parameters multinomial distributions that represent topics. Variational approximations based Kalman filters and nonparametric wavelet regression are carry out approximate posterior inference over latent In addition giving quantitative, predictive a sequential corpus, dynamic topic provide qualitative window into contents collection....

10.1145/1143844.1143859 article EN 2006-01-01

In this paper, we present a statistical approach to machine translation. We describe the application of our translation from French English and give preliminary results.

10.5555/92858.92860 article EN Computational Linguistics 1990-06-01

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of with that language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea these is estimate a for each document, then rank documents by likelihood query according estimated model. A core estimation smoothing, adjusts maximum estimator so correct inaccuracy due data sparseness. In this paper, we study...

10.1145/3130348.3130377 article EN ACM SIGIR Forum 2017-08-02

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of with that language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea these is estimate a for each document, then rank documents by likelihood query according estimated model. A central issue estimation smoothing , adjusting maximum estimator compensate data sparseness. In this article, we study its...

10.1145/984321.984322 article EN ACM transactions on office information systems 2004-04-01

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that words each arise from a mixture topics, which is distribution over vocabulary. A limitation inability to topic correlation even though, example, about genetics more likely also disease than X-ray astronomy. This stems use variability among proportions. In this paper we develop correlated (CTM), where proportions...

10.1214/07-aoas114 article EN The Annals of Applied Statistics 2007-06-01

We present a technique for constructing random fields from set of training samples. The learning paradigm builds increasingly complex by allowing potential functions, or features, that are supported large subgraphs. Each feature has weight is trained minimizing the Kullback-Leibler divergence between model and empirical distribution data. A greedy algorithm determines how features incrementally added to field an iterative scaling used estimate optimal values weights. models techniques...

10.1109/34.588021 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 1997-04-01

We consider the problem of estimating graph associated with a binary Ising Markov random field. describe method based on ℓ1-regularized logistic regression, in which neighborhood any given node is estimated by performing regression subject to an ℓ1-constraint. The analyzed under high-dimensional scaling both number nodes p and maximum size d are allowed grow as function observations n. Our main results provide sufficient conditions triple (n, p, d) model parameters for succeed consistently...

10.1214/09-aos691 article EN other-oa The Annals of Statistics 2010-03-08

The language modeling approach to retrieval has been shown perform well empirically. One advantage of this new is its statistical foundations. However, feedback, as one important component in a system, only dealt with heuristically approach: the original query usually literally expanded by adding additional terms it. Such expansion-based feedback creates an inconsistent interpretation and query. In paper, we present more principled approach. Specifically, treat updating model based on extra...

10.1145/502585.502654 article EN 2001-10-05

We present a framework for information retrieval that combines document models and query using probabilistic ranking function based on Bayesian decision theory. The suggests an operational model extends recent developments in the language modeling approach to retrieval. A each is estimated, as well query, problem cast terms of risk minimization. can be exploited user preferences, context synonomy word senses. While work has incorporated translation this purpose, we introduce new method...

10.1145/3130348.3130375 article EN ACM SIGIR Forum 2017-08-02

Recent methods for estimating sparse undirected graphs real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula---or nonparanormal---for inference. Just as additive models extend linear by replacing functions with set one-dimensional smooth functions, nonparanormal extends normal transforming variables functions. derive method nonparanormal, study method's theoretical properties, and that it works well many...

10.1184/r1/6610712.v1 article EN Journal of Machine Learning Research 2009-12-01

We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this is model how user might distill or "translate" given document into query. To assess relevance user's query, we estimate probability that query would have been generated as translation document, factor general preferences form prior distribution over documents. simple, well motivated document-to-query process, describe an...

10.1145/3130348.3130371 article EN ACM SIGIR Forum 2017-08-02

We present a non-traditional retrieval problem we call subtopic retrieval. The is concerned with finding documents that cover many different subtopics of query topic. In such problem, the utility document in ranking dependent on other ranking, violating assumption independent relevance which assumed most traditional methods. Subtopic poses challenges for evaluating performance, as well developing effective algorithms. propose framework generalizes precision and recall metrics by accounting...

10.1145/2795403.2795405 article EN ACM SIGIR Forum 2015-06-23

Summary We present a new class of methods for high dimensional non-parametric regression and classification called sparse additive models. Our combine ideas from linear modelling regression. derive an algorithm fitting the models that is practical effective even when number covariates larger than sample size. Sparse are essentially functional version grouped lasso Yuan Lin. They also closely related to COSSO model Lin Zhang but decouple smoothing sparsity, enabling use arbitrary smoothers....

10.1111/j.1467-9868.2009.00718.x article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2009-10-19

We present a framework for information retrieval that combines document models and query using probabilistic ranking function based on Bayesian decision theory. The suggests an operational model extends recent developments in the language modeling approach to retrieval. A each is estimated, as well query, problem cast terms of risk minimization. can be exploited user preferences, context synonomy word senses. While work has incorporated translation this purpose, we introduce new method...

10.1145/383952.383970 article EN 2001-09-01

We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating high-dimensional undirected graphical models. To achieve modeling flexibility, we consider models proposed by Liu, Lafferty Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. estimation robustness, exploit nonparametric rank-based correlation coefficient estimators, including Spearman’s rho Kendall’s tau. prove that achieves optimal parametric rates of convergence both graph...

10.1214/12-aos1037 article EN other-oa The Annals of Statistics 2012-08-01

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This implements recent results in the literature, including Friedman et al. (2007), Liu (2009, 2012) and (2010). Compared with existing graph estimation glasso, extra features: (1) instead of using Fortan, it is written C, makes code more portable easier to modify; (2) besides fitting Gaussian graphical models, also semiparametric copula models; (3) like...

10.48550/arxiv.2006.14781 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...