- Topic Modeling
- Bayesian Modeling and Causal Inference
- Natural Language Processing Techniques
- Data Mining Algorithms and Applications
- Semantic Web and Ontologies
- Machine Learning and Algorithms
- Rough Sets and Fuzzy Logic
- Machine Learning and Data Classification
- Data Stream Mining Techniques
- Data Quality and Management
- Text and Document Classification Technologies
- Fuzzy Logic and Control Systems
- Neural Networks and Applications
- AI-based Problem Solving and Planning
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Advanced Graph Neural Networks
- Data Management and Algorithms
- Face and Expression Recognition
- Evolutionary Algorithms and Applications
- Education and Digital Technologies
- Web Data Mining and Analysis
- Gene expression and cancer classification
- Biomedical Text Mining and Ontologies
- Advanced Database Systems and Queries
Administration for Community Living
2023
Carnegie Mellon University
2009-2023
University of California, San Diego
2023
IT University of Copenhagen
2023
Tokyo Institute of Technology
2023
American Jewish Committee
2023
RIKEN Center for Advanced Intelligence Project
2023
Mongolia International University
2023
Pennsylvania State University
2022
Universidade Federal de São Carlos
2009-2019
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent runs forever and each day must (1) extract, or read, information from web to populate growing structured knowledge base, (2) learn perform this task better than on previous day. In particular, we propose approach set design principles for such agent, describe partial implementation system has already learned extract base containing over 242,000 beliefs with estimated precision 74%...
Whereas people learn many different types of knowledge from diverse experiences over years, and become better learners time, most current machine learning systems are much more narrow, just a single function or data model based on statistical analysis set. We suggest that than computers precisely because this difference, we key direction for research is to develop software architectures enable intelligent agents also knowledge, continuously time. In paper define never-ending paradigm...
We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations PlaysSport(athlete, sport)) from web pages, starting with a handful labeled training examples each category or relation, plus hundreds millions unlabeled documents. Semi-supervised using only few is typically unreliable because task underconstrained. This paper pursues thesis that much greater accuracy can be achieved by further constraining task, coupling many extractors...
Social networks have been widely studied over the last century from multiple disciplines to understand societal issues such as inequality in employment rates, managerial performance, and epidemic spread. Today, these many more can be at global scale thanks digital footprints that we generate when browsing Web or using social media platforms. Unfortunately, scientists often struggle access data primarily because it is proprietary, even shared with privacy guarantees, either no representative...
Whereas people learn many different types of knowledge from diverse experiences over years, most current machine learning systems acquire just a single function or data model set. We propose never-ending paradigm for learning, to better reflect the more ambitious and encompassing type performed by humans. As case study, we describe Never-Ending Language Learner (NELL), which achieves some desired properties learner, discuss lessons learned. NELL has been read web 24 hours/day since January...
The goal of sentiment analysis is to determine opinions, emotions, and attitudes presented in source material. In tweet analysis, opinions messages can be typically categorized as positive or negative. To classify them, researchers have been using traditional classifiers like Naive Bayes, Maximum Entropy, Support Vector Machines (SVM). this paper, we show that a SVM classifier combined with cluster ensemble offer better classification accuracies than stand-alone SVM. our study, employed an...
We consider semi-supervised learning of information extraction methods, especially for extracting instances noun categories (e.g., 'athlete', 'team') and relations 'playsForTeam(athlete, team)'). Semi-supervised approaches using a small number labeled examples together with many un-labeled are often unreliable as they frequently produce an internally consistent, but nevertheless incorrect set extractions. propose that this problem can be overcome by simultaneously classifiers different in...
Large Language Models (LLMs) have demonstrated remarkable performance on various tasks, yet their ability to extract and internalize deeper insights from domain-specific datasets remains underexplored. In this study, we investigate how continual pre-training can enhance LLMs' capacity for insight learning across three distinct forms: declarative, statistical, probabilistic insights. Focusing two critical domains: medicine finance, employ LoRA train LLMs existing datasets. To evaluate each...
This paper proposes a feature weighting method based on X <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2 </sup> statistical test, to be used in conjunction with k-NN classifier. Results of empirical experiments conducted using data from several knowledge domains are presented and discussed. Forty four out forty five favoured the weighted approach evidence that proposed process xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> is good strategy
Link prediction is a task that in graph-based data models, as well as, complex networks not only to predict edges will appear near future but also find missing edges. NELL never ending language learner system has the ability continuously learn extract structured information from unstructured text (fetched web pages) and map this growing knowledge base. NELL's base can be seen network, allowing us apply graph mining techniques new enhance performance. In paper we present Prophet, link...
Bosung Kim, Hayate Iso, Nikita Bhutani, Estevam Hruschka, Ndapa Nakashole, Tom Mitchell. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.