- Machine Learning and Data Classification
- Data Management and Algorithms
- Algorithms and Data Compression
- Machine Learning and Algorithms
- Topic Modeling
- Natural Language Processing Techniques
- Neural Networks and Applications
- Sparse and Compressive Sensing Techniques
- Statistical Methods and Inference
- Gaussian Processes and Bayesian Inference
- Face and Expression Recognition
- Explainable Artificial Intelligence (XAI)
- Bayesian Modeling and Causal Inference
- Bayesian Methods and Mixture Models
- Scientific Research and Discoveries
- Advanced Image and Video Retrieval Techniques
- Astronomy and Astrophysical Research
- Galaxies: Formation, Evolution, Phenomena
- Stochastic Gradient Optimization Techniques
- Advanced Bandit Algorithms Research
- Logic, Reasoning, and Knowledge
- Image Retrieval and Classification Techniques
- Computational Physics and Python Applications
- Astronomical Observations and Instrumentation
- Text and Document Classification Technologies
Purdue University West Lafayette
2024
IBM Research - Thomas J. Watson Research Center
2020-2023
Queensland University of Technology
2023
University of Glasgow
2019-2022
IBM (United States)
2019-2021
IBM Research - Ireland
2020
University of North Carolina at Charlotte
2019
University of the West of Scotland
2018
Georgia Institute of Technology
2006-2017
GEI Consultants
2017
We present a catalog of 1,172,157 quasar candidates selected from the photometric imaging data Sloan Digital Sky Survey (SDSS). The objects are all point sources to limiting magnitude i = 21.3 8417 deg2 SDSS Data Release 6 (DR6). This sample extends our previous by using latest public release and probing both ultraviolet (UV)-excess high-redshift quasars. While addition reduces overall efficiency (quasars:quasar candidates) ∼80%, it is expected contain no fewer than 850,000 bona fide...
The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain data science. New techniques automating the creation AI, known as AutoAI or AutoML, aim to automate work practices scientists. systems are capable autonomously ingesting and pre-processing data, engineering new features, creating scoring models based on a target objectives (e.g. accuracy run-time efficiency). Though not yet widely adopted, we interested understanding how will...
We present a catalog of 100,563 unresolved, UV-excess (UVX) quasar candidates to g=21 from 2099 deg^2 the Sloan Digital Sky Survey (SDSS) Data Release One (DR1) imaging data. Existing spectra 22,737 sources reveals that 22,191 (97.6%) are quasars; accounting for magnitude dependence this efficiency, we estimate 95,502 (95.0%) objects in quasars. Such high efficiency is unprecedented broad-band surveys This ``proof-of-concept'' sample designed be maximally efficient, but still has 94.7% completeness g
We present new measurements of the quasar autocorrelation from a sample \~80,000 photometrically-classified quasars taken SDSS DR1. find best-fit model $\omega(\theta) = (0.066\pm^{0.026}_{0.024})\theta^{-(0.98\pm0.15)}$ for angular autocorrelation, consistent with estimates spectroscopic surveys. show that only models little or no evolution in clustering comoving coordinates since z~1.4 can recover scale-length local galaxies and Active Galactic Nuclei (AGNs). A is best explained current...
The problem of efficiently finding the best match for a query in given set with respect to Euclidean distance or cosine similarity has been extensively studied. However, closely related inner-product never explored general setting our knowledge. In this paper we consider and contrast it previous problems considered. First, propose branch-and-bound algorithm based on (single) tree data structure. Subsequently, present dual-tree case where there are multiple queries. Our proposed algorithms...
We propose a novel framework seamlessly providing key properties of both neural nets (learning) and symbolic logic (knowledge reasoning). Every neuron has meaning as component formula in weighted real-valued logic, yielding highly intepretable disentangled representation. Inference is omnidirectional rather than focused on predefined target variables, corresponds to logical reasoning, including classical first-order theorem proving special case. The model end-to-end differentiable, learning...
Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramón Fernandez Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue, Dinesh Garg, Alfio Gliozzo, Sairam Gurajada, Hima Karanam, Naweed Khan, Khandelwal, Young-Suk Lee, Yunyao Li, Francois Luus, Ndivhuwo Makondo, Nandana Mihindukulasooriya, Tahira Naseem, Sumit Neelam, Lucian Popa, Revanth Gangi Reddy, Ryan Riegel, Gaetano Rossiello, Udit Sharma, G P Shrivatsa Bhargav, Mo Yu. Findings...
Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches density can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric kernel smoothing splines. While neither choice should universally preferred for situations, well-known benefit their ability achieve optimality ANY input distribution more data are observed, property that no model with assumption have, one...
We present evidence of a large angle correlation between the cosmic microwave background measured by WMAP and catalog photometrically detected quasars from SDSS. The observed cross is (0.30 +- 0.14) microK at zero lag, with shape consistent that expected for correlations arising integrated Sachs-Wolfe effect. photometric redshifts are centered z ~ 1.5, making this deepest survey in which such has been observed. Assuming due to ISW effect, constitutes earliest yet dark energy it can be used...
Abstract Background The majority of ovarian cancer biomarker discovery efforts focus on the identification proteins that can improve predictive power presently available diagnostic tests. We here show metabolomics, study metabolic changes in biological systems, also provide characteristic small molecule fingerprints related to this disease. Results In work, new approaches automatic classification metabolomic data produced from sera patients and benign controls are investigated. performance...
The Euclidean Minimum Spanning Tree problem has applications in a wide range of fields, and many efficient algorithms have been developed to solve it. We present new, fast, general EMST algorithm, motivated by the clustering analysis astronomical data. Large-scale surveys, including Sloan Digital Sky Survey, large simulations early universe, such as Millennium Simulation, can contain millions points fill terabytes storage. Traditional methods scale quadratically, more advanced lack rigorous...
Abstract Background: Ovarian cancer diagnosis is problematic because the disease typically asymptomatic, especially at early stages of progression and/or recurrence. We report here integration a new mass spectrometric technology with novel support vector machine computational method for use in diagnostics, and describe application to ovarian cancer. Methods: coupled high-throughput ambient ionization technique spectrometry (direct analysis real-time spectrometry) profile relative metabolite...
In this paper we develop density estimation trees (DETs), the natural analog of classification and regression trees, for task estimation. We consider a joint probability function d-dimensional random vector X define piecewise constant estimator structured as decision tree. The integrated squared error is minimized to learn show that method nonparametric: under standard conditions nonparametric estimation, DETs are shown be asymptotically consistent. addition, being perform automatic feature...
MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both simple, consistent API accessible to novice users and high performance flexibility expert by leveraging modern features of C++. provides cutting-edge algorithms whose benchmarks exhibit far better than other leading libraries. version 1.0.3, licensed under the LGPL, available at http://www.mlpack.org.
We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. propose novel scheme leveraging alternating direction method multipliers (ADMM). The proposed framework able to (i) decompose into easier sub-problems that have reduced number circumvent...
Interest in logics with some notion of real-valued truths has existed since at least Boole and been increasing AI due to the emergence neuro-symbolic approaches, though often their logical inference capabilities are characterized only qualitatively. We provide foundations for establishing correctness power such systems. introduce a rich class multidimensional sentences, sound complete axiomatization that can be parameterized cover many logics, including all common fuzzy extend these weighted...