- Topic Modeling
- Semantic Web and Ontologies
- Bayesian Modeling and Causal Inference
- Natural Language Processing Techniques
- Web Data Mining and Analysis
- Data Mining Algorithms and Applications
- Information Retrieval and Search Behavior
- Data Quality and Management
- Mobile Crowdsensing and Crowdsourcing
- Expert finding and Q&A systems
- Astro and Planetary Science
- Music and Audio Processing
- Speech and Audio Processing
- Recommender Systems and Techniques
- Speech Recognition and Synthesis
- Consumer Market Behavior and Pricing
- Schizophrenia research and treatment
- Data Management and Algorithms
- Scientific Computing and Data Management
- Multimodal Machine Learning Applications
- Complex Network Analysis Techniques
- Advanced Text Analysis Techniques
- AI-based Problem Solving and Planning
- Logic, Reasoning, and Knowledge
- Neural Networks and Applications
South London and Maudsley NHS Foundation Trust
2018-2022
Microsoft (United States)
2006-2021
Microsoft Research (United Kingdom)
2013-2021
University of Washington
2001-2021
University of Edinburgh
2020
Berkeley College
2020
University of California, Berkeley
2020
California Polytechnic State University
2017
The University of Tokyo
2017
Karlsruhe Institute of Technology
2017
Viral marketing takes advantage of networks influence among customers to inexpensively achieve large changes in behavior. Our research seeks put it on a firmer footing by mining these from data, building probabilistic models them, and using choose the best viral plan. Knowledge-sharing sites, where review products advise each other, are fertile source for this type data mining. In paper we extend our previous techniques, achieving reduction computational cost, apply them knowledge-sharing...
Search engine advertising has become a significant element of the Web browsing experience. Choosing right ads for query and order in which they are displayed greatly affects probability that user will see click on each ad. This ranking strong impact revenue search receives from ads. Further, showing an ad prefer to improves satisfaction. For these reasons, it is important be able accurately estimate click-through rate system. have been repeatedly, this empirically measurable, but new ads,...
We present MCTest, a freely available set of stories and associated questions intended for research on the machine comprehension text. Previous work (e.g., semantic modeling) has made great strides, but primarily focuses either limited-domain datasets, or solving more restricted goal open-domain relation extraction). In contrast, MCTest requires machines to answer multiple-choice reading about fictional stories, directly tackling high-level comprehension. Reading can test advanced abilities...
We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of structure. In the static strategy that is used in toolkits like Theano, CNTK, and TensorFlow, user first defines computation graph (a symbolic representation computation), then examples are fed into an engine executes this computes its derivatives. DyNet's strategy, construction mostly transparent, being implicitly constructed by executing procedural code outputs, free to use different...
We demonstrate the value of collecting semantic parse labels for knowledge base question answering. In particular, (1) unlike previous studies on small-scale datasets, we show that learning from labeled parses significantly improves overall performance, resulting in absolute 5 point gain compared to answers, (2) with an appropriate user interface, one can obtain high accuracy and at a cost comparable or lower than obtaining just (3) have created shared largest semantic-parse dataset date...
When translating natural language questions into SQL queries to answer from a database, contemporary semantic parsing models struggle generalize unseen database schemas. The generalization challenge lies in (a) encoding the relations an accessible way for parser, and (b) modeling alignment between columns their mentions given query. We present unified framework, based on relation-aware self-attention mechanism, address schema encoding, linking, feature representation within text-to-SQL...
Characterizing the relationship that exists between a person's social group and his/her personal behavior has been long standing goal of network analysts. In this paper, we apply data mining techniques to study for population over 10 million people, by turning online sources data. The analysis reveals people who chat with each other (using instant messaging) are more likely share interests (their Web searches same or topically similar). time they spend talking, stronger is. People also...
Since the publication of Brin and Page's paper on PageRank, many in Web community have depended PageRank for static (query-independent) ordering pages. We show that we can significantly outperform using features are independent link structure Web. gain a further boost accuracy by data frequency at which users visit use RankNet, ranking machine learning algorithm, to combine these other based anchor text domain characteristics. The resulting model achieves pairwise 67.3% (vs. 56.7% or 50% random).
Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs correctly recognize natural language references columns and values ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (StruG) text-to-SQL that can effectively learn based on parallel corpus. We identify set of prediction tasks: column grounding, value grounding column-value mapping, leverage pretrain encoder....
In this article, we demonstrate the value of long-term query logs. Most work on logs to date considers only short-term (within-session) information. contrast, show that can be used learn about world live in. There are many applications lead not improving search engine for its users, but also potentially advances in other disciplines such as medicine, sociology, economics, and more. will how these purposes, their potential is severely reduced if limited short time horizons. We effects...
Mrinmaya Sachan, Kumar Dubey, Eric Xing, Matthew Richardson. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015.
Search systems traditionally require searchers to formulate information needs as keywords rather than in a more natural form, such questions. Recent studies have found that Web search engines are observing an increase the fraction of queries phrased language. As part building better engines, it is important understand nature and prevalence these intentions, impact this on engine performance. In work, we show while 10.3% issued direct question intent, only 3.2% them formulated language We...
Acquiring knowledge has long been the major bottleneck preventing rapid spread of AI systems. Manual approaches are slow and costly. Machine-learning have limitations in depth breadth they can acquire. The Internet made possible a third solution: building bases by mass collaboration, with thousands volunteers contributing simultaneously. While this approach promises large improvements speed cost base development, it only succeed if problem ensuring quality, relevance consistency is...
Personalization is ubiquitous in modern online applications as it provides significant improvements user experience by adapting to inferred preferences. However, there are increasing concerns related issues of privacy and control the data that aggregated systems power personalized experiences. These particularly for profile aggregation advertising. This paper describes a practical, learning-driven client-side personalization approach keyword advertising platforms, an emerging application...
Synchronous social Q&A systems exist on the Web and in enterprise to connect people with questions answers real-time. In such systems, askers' desire for quick is tension costs associated interrupting numerous candidate answerers per question. Supporting users of synchronous at various points question lifecycle (from conception answer) helps askers make informed decisions about likelihood success face fewer interruptions. For example, predicting that a will not be well answered may lead...