- Semantic Web and Ontologies
- Data Mining Algorithms and Applications
- Advanced Database Systems and Queries
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Academic and Historical Perspectives in Psychology
- Advanced Clustering Algorithms Research
- Service-Oriented Architecture and Web Services
- Logic, programming, and type systems
- Complex Network Analysis Techniques
- Data Management and Algorithms
- Data Analysis with R
- Authorship Attribution and Profiling
- Advanced Software Engineering Methodologies
- Topic Modeling
- Social Representations and Identity
- Business Process Modeling and Analysis
- Opinion Dynamics and Social Influence
- Psychology Research and Bibliometrics
- Algorithms and Data Compression
- Scientific Computing and Data Management
- Logic, Reasoning, and Knowledge
- Software Engineering Research
- Data Visualization and Analytics
- Biomedical Text Mining and Ontologies
Fachhochschule Wiener Neustadt
2016-2017
TU Wien
2006-2016
Vienna University of Economics and Business
2007-2011
First Technical University
2007
During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present <strong>tm</strong> package which provides framework for applications within R. give survey on facilities in R explain how typical application tasks can be carried out using our framework. techniques count-based analysis methods, clustering, classification string kernels.
Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both terms of solution quality and computational efficiency. Spherical k-means clustering one approach to address issues, employing cosine dissimilarities prototype-based partitioning term weight representations the documents. This paper presents theory underlying standard spherical problem suitable extensions, introduces R extension package <b>skmeans</b> provides environment for...
Identifying the language used will typically be first step in most natural processing tasks. Among wide variety of identification methods discussed literature, ones employing Cavnar and Trenkle (1994) approach to text categorization based on character n-gram frequencies have been particularly successful. This paper presents R extension package <b>textcat</b> for which implements both as well a reduced designed remove redundancies original approach. A multi-lingual corpus obtained from...
This study investigated the intellectual structure of early American psychology by generating 3 networks that collectively included every substantive article published in Psychological Review during 15-year period from journal's start 1894 until 1908. The were laid out so articles with strongly correlated vocabularies positioned close to each other spatially. Then, we identified distinct research communities locating and interpreting clusters within networks. We found that, first 5-year time...
Social Network Analysis (SNA) provides tools to examine relationships between people.Text Mining (TM) allows capturing the text they produce in Web 2.0 applications, for example, however it neglects their social structure.This paper applies an approach combine two methods named "content-based SNA".Using R mailing lists, R-help and R-devel, we show how this combination can be used describe people's interests find out if authors who have similar actually communicate.We that expected positive...
Traditionally, American psychology at the turn of twentieth century has been framed as a competition among number "schools": structuralism, functionalism, behaviorism, etc. But this is only one way in which "structure" discipline can be conceived. Most psychologists did not belong to particular school, but they still worked within loose intellectual communities, and so their work was part an implicit psychological "genre," if formalized "school." In study, we began process discovering...
This study continues a previous investigation of the intellectual structure early American psychology by presenting and analyzing 3 networks that collectively include every substantive article published in Psychological Review during 15-year period from 1909 to 1923. The were laid out such articles (represented network's nodes) possessed strongly correlated vocabularies positioned closer each other spatially than with weakly vocabularies. We identified distinct research communities within...
The American Journal of Psychology (AJP) was the first academic journal in United States dedicated to "new" scientific form discipline. But where did journal's founding owner/editor, G. Stanley Hall, find "psychologists" he needed fill pages such a venture 1887, when still virtually only professor psychology country? To investigate this question we used substantive vocabularies every full article published AJP's 14 volumes generate networks verbally similar articles. These reveal variety...
The Unified Modelling Language (UML) can be used to specify complex systems: component types are modelled as classes, interdependencies associations with multiplicities and labels. This paper describes how handle constraints on declaratively by translating them inequalities over integers without adding complexity. method provides well-defined semantics allows for efficient algorithms reasoning tasks like detecting inconsistencies. We identify some challenges arising from the use of class...
In order to better understand the broader trends and points of contention in early American psychology, it is conventional organize relevant material terms “schools” psychology—structuralism, functionalism, etc. Although not without value, this scheme marginalizes many otherwise significant figures, tends exclude a large number secondary, but interesting, individuals. an effort address these problems, we grouped all articles that appeared second third decades Psychological Review into...
R has gained explicit text mining support with the tm package enabling statisticians to answer many interesting research questions via statistical analysis or modeling of (text) corpora. However, we typically face two challenges when analyzing large corpora: (1) amount data be processed in a single machine is usually limited by available main memory (i.e., RAM), and (2) more analyzed higher need for efficient procedures calculating valuable results. Fortunately, adequate programming models...
The Unified Modeling Language (UML) has become a universal tool for the formal object-oriented specification of hard- and software. In particular, UML class diagrams so-called multiplicities, which restrict number links between objects, are essential when using applications like admissible configurations components. this paper we give definition semantics multiplicities. We extend results obtained in context entity relationship to cover specific extensions (non-)uniqueness attribute binary...
Abstract Configuration of large-scale applications in an engineering context requires a modeling environment that allows the design engineer to draft configuration problem natural way and efficient methods can process modeled setting scale with number components. Existing artificial intelligence typically perform quite well certain subareas but are hard use for general-purpose without mathematical or logics background (the so-called knowledge acquisition bottleneck) and/or have scalability...