NFDI4DS | UHH-SEMS - Publication Details

John Lafferty

ORCID: 0000-0002-5929-220X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5060219657

Research Areas

Statistical Methods and Inference
Natural Language Processing Techniques
Bayesian Modeling and Causal Inference
Topic Modeling
Bayesian Methods and Mixture Models
Neural Networks and Applications
Machine Learning and Algorithms
Sparse and Compressive Sensing Techniques
Hemoglobinopathies and Related Disorders
Algorithms and Data Compression
Advanced Statistical Methods and Models
Face and Expression Recognition
Text and Document Classification Technologies
Information Retrieval and Search Behavior
Machine Learning and Data Classification
Distributed Sensor Networks and Detection Algorithms
Iron Metabolism and Disorders
Statistical Methods and Bayesian Inference
Stochastic Gradient Optimization Techniques
Gene expression and cancer classification
Blood groups and transfusion
Advanced Text Analysis Techniques
Speech Recognition and Synthesis
Markov Chains and Monte Carlo Methods
Gaussian Processes and Bayesian Inference

Yale University
2015-2024

Carnegie Mellon University
2007-2020

Johns Hopkins University
2012-2020

University of Chicago
2012-2017

New Mexico Institute of Mining and Technology
2016

Amazon (United States)
2016

University of Pennsylvania
2016

University of Illinois Chicago
2015

Princeton University
2007-2014

Stanford University
2014

Dynamic topic models

OPENALEX - Publications

David M. Blei John Lafferty

A family of probabilistic time series models is developed to analyze the evolution topics in large document collections. The approach use state space on natural parameters multinomial distributions that represent topics. Variational approximations based Kalman filters and nonparametric wavelet regression are carry out approximate posterior inference over latent In addition giving quantitative, predictive a sequential corpus, dynamic topic provide qualitative window into contents collection....

10.1145/1143844.1143859 article EN 2006-01-01

A statistical approach to machine translation

OPENALEX - Publications

Peter F. Brown John Cocke Stephen A. Della Pietra Vincent J. Della Pietra Fredrick Jelinek and 3 more

In this paper, we present a statistical approach to machine translation. We describe the application of our translation from French English and give preliminary results.

10.5555/92858.92860 article EN Computational Linguistics 1990-06-01

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

OPENALEX - Publications

ChengXiang Zhai John Lafferty

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of with that language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea these is estimate a for each document, then rank documents by likelihood query according estimated model. A core estimation smoothing, adjusts maximum estimator so correct inaccuracy due data sparseness. In this paper, we study...

10.1145/3130348.3130377 article EN ACM SIGIR Forum 2017-08-02

A study of smoothing methods for language models applied to information retrieval

OPENALEX - Publications

ChengXiang Zhai John Lafferty

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of with that language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea these is estimate a for each document, then rank documents by likelihood query according estimated model. A central issue estimation smoothing , adjusting maximum estimator compensate data sparseness. In this article, we study its...

10.1145/984321.984322 article EN ACM transactions on office information systems 2004-04-01

A correlated topic model of Science

OPENALEX - Publications

David M. Blei John Lafferty

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that words each arise from a mixture topics, which is distribution over vocabulary. A limitation inability to topic correlation even though, example, about genetics more likely also disease than X-ray astronomy. This stems use variability among proportions. In this paper we develop correlated (CTM), where proportions...

10.1214/07-aoas114 article EN The Annals of Applied Statistics 2007-06-01

Inducing features of random fields

OPENALEX - Publications

S. Della Pietra V. Della Pietra John Lafferty

We present a technique for constructing random fields from set of training samples. The learning paradigm builds increasingly complex by allowing potential functions, or features, that are supported large subgraphs. Each feature has weight is trained minimizing the Kullback-Leibler divergence between model and empirical distribution data. A greedy algorithm determines how features incrementally added to field an iterative scaling used estimate optimal values weights. models techniques...

10.1109/34.588021 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 1997-04-01

High-dimensional Ising model selection using ℓ1-regularized logistic regression

OPENALEX - Publications

Pradeep Ravikumar Martin J. Wainwright John Lafferty

We consider the problem of estimating graph associated with a binary Ising Markov random field. describe method based on ℓ1-regularized logistic regression, in which neighborhood any given node is estimated by performing regression subject to an ℓ1-constraint. The analyzed under high-dimensional scaling both number nodes p and maximum size d are allowed grow as function observations n. Our main results provide sufficient conditions triple (n, p, d) model parameters for succeed consistently...

10.1214/09-aos691 article EN other-oa The Annals of Statistics 2010-03-08

A study of smoothing methods for language models applied to Ad Hoc information retrieval

OPENALEX - Publications

ChengXiang Zhai John Lafferty

10.1145/383952.384019 article EN 2001-09-01

Model-based feedback in the language modeling approach to information retrieval

OPENALEX - Publications

ChengXiang Zhai John Lafferty

The language modeling approach to retrieval has been shown perform well empirically. One advantage of this new is its statistical foundations. However, feedback, as one important component in a system, only dealt with heuristically approach: the original query usually literally expanded by adding additional terms it. Such expansion-based feedback creates an inconsistent interpretation and query. In paper, we present more principled approach. Specifically, treat updating model based on extra...

10.1145/502585.502654 article EN 2001-10-05

Document Language Models, Query Models, and Risk Minimization for Information Retrieval

OPENALEX - Publications

John Lafferty ChengXiang Zhai

We present a framework for information retrieval that combines document models and query using probabilistic ranking function based on Bayesian decision theory. The suggests an operational model extends recent developments in the language modeling approach to retrieval. A each is estimated, as well query, problem cast terms of risk minimization. can be exploited user preferences, context synonomy word senses. While work has incorporated translation this purpose, we introduce new method...

10.1145/3130348.3130375 article EN ACM SIGIR Forum 2017-08-02

Statistical Models for Text Segmentation

OPENALEX - Publications

Doug Beeferman Adam Berger John Lafferty

10.1023/a:1007506220214 article EN Machine Learning 1999-01-01

The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

OPENALEX - Publications

Han Liu John Lafferty Larry Wasserman

Recent methods for estimating sparse undirected graphs real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula---or nonparanormal---for inference. Just as additive models extend linear by replacing functions with set one-dimensional smooth functions, nonparanormal extends normal transforming variables functions. derive method nonparanormal, study method's theoretical properties, and that it works well many...

10.1184/r1/6610712.v1 article EN Journal of Machine Learning Research 2009-12-01

Information Retrieval as Statistical Translation

OPENALEX - Publications

Adam Berger John Lafferty

We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this is model how user might distill or "translate" given document into query. To assess relevance user's query, we estimate probability that query would have been generated as translation document, factor general preferences form prior distribution over documents. simple, well motivated document-to-query process, describe an...

10.1145/3130348.3130371 article EN ACM SIGIR Forum 2017-08-02

Beyond Independent Relevance

OPENALEX - Publications

ChengXiang Zhai William W. Cohen John Lafferty

We present a non-traditional retrieval problem we call subtopic retrieval. The is concerned with finding documents that cover many different subtopics of query topic. In such problem, the utility document in ranking dependent on other ranking, violating assumption independent relevance which assumed most traditional methods. Subtopic poses challenges for evaluating performance, as well developing effective algorithms. propose framework generalizes precision and recall metrics by accounting...

10.1145/2795403.2795405 article EN ACM SIGIR Forum 2015-06-23

Sparse Additive Models

OPENALEX - Publications

Pradeep Ravikumar John Lafferty Han Liu Larry Wasserman

Summary We present a new class of methods for high dimensional non-parametric regression and classification called sparse additive models. Our combine ideas from linear modelling regression. derive an algorithm fitting the models that is practical effective even when number covariates larger than sample size. Sparse are essentially functional version grouped lasso Yuan Lin. They also closely related to COSSO model Lin Zhang but decouple smoothing sparsity, enabling use arbitrary smoothers....

10.1111/j.1467-9868.2009.00718.x article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2009-10-19

Document language models, query models, and risk minimization for information retrieval

OPENALEX - Publications

John Lafferty ChengXiang Zhai

10.1145/383952.383970 article EN 2001-09-01

High-dimensional semiparametric Gaussian copula graphical models

OPENALEX - Publications

Han Liu Fang Han Ming Yuan John Lafferty Larry Wasserman

We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating high-dimensional undirected graphical models. To achieve modeling flexibility, we consider models proposed by Liu, Lafferty Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. estimation robustness, exploit nonparametric rank-based correlation coefficient estimators, including Spearman’s rho Kendall’s tau. prove that achieves optimal parametric rates of convergence both graph...

10.1214/12-aos1037 article EN other-oa The Annals of Statistics 2012-08-01

The huge Package for High-dimensional Undirected Graph Estimation in R

OPENALEX - Publications

Tuo Zhao Han Liu Kathryn Roeder John Lafferty Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This implements recent results in the literature, including Friedman et al. (2007), Liu (2009, 2012) and (2010). Compared with existing graph estimation glasso, extra features: (1) instead of using Fortan, it is written C, makes code more portable easier to modify; (2) besides fitting Gaussian graphical models, also semiparametric copula models; (3) like...

10.48550/arxiv.2006.14781 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Coming Soon ...