Carlotta Domeniconi

ORCID: 0000-0003-2140-9596
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Clustering Algorithms Research
  • Face and Expression Recognition
  • Text and Document Classification Technologies
  • Complex Network Analysis Techniques
  • Advanced Image and Video Retrieval Techniques
  • Machine Learning in Bioinformatics
  • Data Management and Algorithms
  • Machine Learning and Data Classification
  • Advanced Graph Neural Networks
  • Mobile Crowdsensing and Crowdsourcing
  • Domain Adaptation and Few-Shot Learning
  • Bioinformatics and Genomic Networks
  • Computational Drug Discovery Methods
  • Image Retrieval and Classification Techniques
  • Neural Networks and Applications
  • Data Stream Mining Techniques
  • Anomaly Detection Techniques and Applications
  • Machine Learning and Algorithms
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Data Mining Algorithms and Applications
  • Natural Language Processing Techniques
  • Bayesian Methods and Mixture Models
  • Gene expression and cancer classification
  • Opinion Dynamics and Social Influence

George Mason University
2016-2025

Shandong University
2021-2023

Southwest University
2020-2021

King Abdullah University of Science and Technology
2019-2020

Baylor College of Medicine
2014

Nanyang Technological University
2014

National University of Singapore
2014

Technologies pour la Santé
2014

University of Technology Sydney
2014

Jamia Millia Islamia
2014

This paper presents Online Topic Model (OLDA), a topic model that automatically captures the thematic patterns and identifies emerging topics of text streams their changes over time. Our approach allows modeling framework, specifically Latent Dirichlet Allocation (LDA) model, to work in an online fashion such it incrementally builds up-to-date (mixture per document mixture words topic) when new (or set documents) appears. A solution based on Empirical Bayes method is proposed. The idea...

10.1109/icdm.2008.140 article EN 2008-12-01

Protein function prediction is to assign biological or biochemical functions proteins, and it a challenging computational problem characterized by several factors: (1) the number of labels (annotations) large; (2) protein may be associated with multiple labels; (3) are structured in hierarchy; (4) incomplete. Current predictive models often assume that labeled proteins complete, i.e. no label missing. But real scenarios, we aware only some hierarchical protein, not know whether additional...

10.1186/s12859-014-0430-y article EN cc-by BMC Bioinformatics 2015-01-15

Long non-coding RNAs (lncRNAs) play crucial roles in complex disease diagnosis, prognosis, prevention and treatment, but only a small portion of lncRNA-disease associations have been experimentally verified. Various computational models proposed to identify by integrating heterogeneous data sources. However, existing generally ignore the intrinsic structure sources or treat them as equally relevant, while they may not be.To accurately associations, we propose Matrix Factorization based...

10.1093/bioinformatics/btx794 article EN Bioinformatics 2017-12-05

Nearest-neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using nearest-neighbor rule. We propose a adaptive method try minimize bias. use chi-squared distance analysis compute flexible metric for producing neighborhoods that are highly query locations. Neighborhoods elongated along less relevant...

10.1109/tpami.2002.1033219 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2002-09-01

Document classification presents difficult challenges due to the sparsity and high dimensionality of text data, complex semantics natural language. The traditional document representation is a word-based vector (Bag Words, or BOW), where each dimension associated with term dictionary containing all words that appear in corpus. Although simple commonly used, this has several limitations. It essential embed semantic information conceptual patterns order enhance prediction capabilities...

10.1145/1401890.1401976 article EN 2008-08-24

Cluster ensembles offer a solution to challenges inherent clustering arising from its ill-posed nature. can provide robust and stable solutions by leveraging the consensus across multiple results, while averaging out emergent spurious structures that arise due various biases which each participating algorithm is tuned. In this article, we address problem of combining weighted clusters belong different subspaces input space. We leverage diversity clusterings in order generate partition...

10.1145/1460797.1460800 article EN ACM Transactions on Knowledge Discovery from Data 2009-01-01

Learning from multi-view multi-label data has wide applications. There are two main challenges of this learning task: incomplete views and missing (weak) labels. The former assumes that may not include all objects. weak label setting implies only a subset relevant labels provided for training objects while other missing. Both can lead to significant performance degradation. In paper, we propose novel model (iMVWL) jointly address the challenges. iMVWL simultaneously learns shared subspace...

10.24963/ijcai.2018/375 article EN 2018-07-01

Heterogeneous network embedding (HNE) is a challenging task due to the diverse node types and/or relationships between nodes. Existing HNE methods are typically unsupervised. To maximize profit of utilizing rare and valuable supervised information in HNEs, we develop novel Active Network Embedding (ActiveHNE) framework, which includes two components: Discriminative (DHNE) Query Networks (AQHN).In DHNE, introduce semi-supervised heterogeneous method based on graph convolutional neural...

10.24963/ijcai.2019/294 preprint EN 2019-07-28

Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d whose domain is numbers, that specifies each dimension, find good approximation number records satisfy query. We present new histogram technique designed density datasets with attributes. Our finds buckets variable size, allows overlap. Overlapping allow...

10.1145/335191.335448 article EN ACM SIGMOD Record 2000-05-16

In this paper we address the issue of using local embeddings for data visualization in two and three dimensions, classification. We advocate their use on basis that they provide an efficient mapping procedure from original dimension data, to a lower intrinsic dimension. depict how can accurately capture user's perception similarity high-dimensional purposes. Moreover, exploit low-dimensional provided by these embeddings, develop new classification techniques, show experimentally accuracy is...

10.1145/775047.775143 article EN 2002-07-23

SVMs (support vector machines) suffer from the problem of large memory requirement and CPU time when trained in batch mode on data sets. We overcome these limitations, at same make suitable for learning with streams, by constructing incremental algorithms. first introduce compare different techniques, show that they are capable producing performance results similar to algorithm, some cases superior condensation properties. then consider training using stream data. Our objective is maintain...

10.1109/icdm.2001.989572 article EN 2002-11-14

Previous chapter Next Full AccessProceedings Proceedings of the 2004 SIAM International Conference on Data Mining (SDM)Subspace Clustering High Dimensional DataCarlotta Domeniconi, Dimitris Papadopoulos, Dimitrios Gunopulos, and Sheng MaCarlotta Mapp.517 - 521Chapter DOI:https://doi.org/10.1137/1.9781611972740.58PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract suffers from curse dimensionality, similarity functions that use all input features...

10.1137/1.9781611972740.58 article EN 2004-04-22

Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free is quite difficult and often impractical, presence noisy may compromise performance learning. Partial (PML) addresses scenario in which each instance annotated with a set candidate labels, only subset corresponds to ground-truth. The PML problem more challenging than partial-label learning, since latter assumes one label valid ignore correlation...

10.1109/icdm.2018.00192 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

In multiview multilabel learning, each object is represented by several heterogeneous feature representations and also annotated with a set of discrete nonexclusive labels. Previous studies typically focus on capturing the shared latent patterns among multiple views, while not sufficiently considering diverse characteristics individual which can cause performance degradation. this article, we propose novel approach [individuality- commonality-based learning (ICM2L)] to explicitly explore...

10.1109/tcyb.2019.2950560 article EN IEEE Transactions on Cybernetics 2019-11-19

Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure instances associated multiple labels well explored yet. In this paper, we propose a...

10.1609/aaai.v33i01.33014400 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Multi-view clustering aims at integrating complementary information from multiple heterogeneous views to improve results. Existing multi-view solutions can only output a single of the data. Due their multiplicity, data, have different groupings that are reasonable and interesting perspectives. However, how find multiple, meaningful, diverse results data is still rarely studied challenging topic in clusterings. In this paper, we introduce deep matrix factorization based solution (DMClusts)...

10.1609/aaai.v34i04.6104 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This becomes invalid in high dimensions with finite number examples due curse dimensionality. Severe bias can be introduced under these conditions when using rule. employment adaptive metric crucial order keep probabilities close uniform, thereby minimizing estimates. We propose that computes flexible by...

10.1109/tnn.2005.849821 article EN IEEE Transactions on Neural Networks 2005-07-01

High-throughput experimental techniques produce several kinds of heterogeneous proteomic and genomic data sets. To computationally annotate proteins, it is necessary promising to integrate these sources. Some methods transform sources into different kernels or feature representations. Next, are linearly (or nonlinearly) combined a composite kernel. The kernel utilized develop predictive model infer the function proteins. A protein can have multiple roles functions labels). Therefore,...

10.1109/tcbb.2013.111 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2013-07-01

We analyzed informal learning in Scratch Online -- an online community with over 4.3 million users and 6.7 user-generated content. Users develop projects, which are graphical interfaces involving manipulation of programming blocks. investigated two fundamental questions: how can we model learning, what patterns emerge. proceeded phases. First, modeled as a trajectory cumulative block usage by long-term who created at least 50 projects. Second, applied K-means++ clustering to uncover...

10.1145/2724660.2724674 article EN 2015-03-09
Coming Soon ...