- Data Management and Algorithms
- Parallel Computing and Optimization Techniques
- Algorithms and Data Compression
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
- Machine Learning and Algorithms
- Numerical Methods and Algorithms
- Adversarial Robustness in Machine Learning
- Face and Expression Recognition
- Distributed and Parallel Computing Systems
- Embedded Systems Design Techniques
- Advanced Clustering Algorithms Research
- Neural Networks and Applications
- Sparse and Compressive Sensing Techniques
- Complexity and Algorithms in Graphs
- Data Mining Algorithms and Applications
- Network Security and Intrusion Detection
- Stochastic Gradient Optimization Techniques
- Music and Audio Processing
- Advanced Malware Detection Techniques
- Anomaly Detection Techniques and Applications
- Automated Road and Building Extraction
- Advanced Multi-Objective Optimization Algorithms
- Advanced Database Systems and Queries
- Semantic Web and Ontologies
Booz Allen Hamilton (United States)
2024
NortonLifeLock (United States)
2016-2021
Freie Universität Berlin
2018
Czech Academy of Sciences, Institute of Computer Science
2018
Data61
2017
Commonwealth Scientific and Industrial Research Organisation
2017
Georgia Institute of Technology
2011-2015
Georgia Tech Research Institute
2014
The C++ language is often used for implementing functionality that performance and/or resource sensitive. While the standard library provides many useful algorithms (such as sorting), in its current form it does not provide direct handling of linear algebra (matrix maths). Armadillo an open source linear language, aiming towards a good balance between speed and ease of use. Its high-level Application Programming Interface (API) deliberately similar to widely Matlab Octave languages...
Deep neural networks (DNNs) are powerful nonlinear architectures that known to be robust random perturbations of the input. However, these models vulnerable adversarial perturbations--small input changes crafted explicitly fool model. In this paper, we ask whether a DNN can distinguish samples from their normal and noisy counterparts. We investigate model confidence on by looking at Bayesian uncertainty estimates, available in dropout networks, performing density estimation subspace deep...
Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized sinkholed. This means that an infected system may attempt access many in contact the server. Therefore, automatic detection DGA is important task, both for sake blocking malicious identifying compromised hosts. However, DGAs English wordlists generate plausibly clean-looking names; this difficult. In work, we devise notion difficulty families called smashword...
MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both simple, consistent API accessible to novice users and high performance flexibility expert by leveraging modern features of C++. provides cutting-edge algorithms whose benchmarks exhibit far better than other leading libraries. version 1.0.3, licensed under the LGPL, available at http://www.mlpack.org.
For over 15 years, the mlpack machine learning library has served as a "swiss army knife'' for C++-based (Curtin et al., 2013).Its efficient implementations of common and cutting-edge algorithms have been used in wide variety scientific industrial applications.This paper overviews 4, significant upgrade its predecessor 2018).The significantly refactored redesigned to facilitate an easier prototyping-to-deployment pipeline, including bindings other languages (Python, Julia, R, Go, command...
A major challenge in the deployment of scientific software solutions is adaptation research prototypes to production-grade code. While high-level languages like MATLAB are useful for rapid prototyping, they lack resource efficiency required scalable production applications, necessitating translation into lower level C++. Further, machine learning and signal processing underlying linear algebra primitives, generally provided by standard BLAS LAPACK libraries, unwieldy difficult use, requiring...
The wide applicability of kernels makes the problem max-kernel search ubiquitous and more general than usual similarity in metric spaces. We focus on solving this efficiently. begin by characterizing inherent hardness with a novel notion directional concentration. Following that, we present method to use an O(n log n) algorithm index any set objects (points RD or abstract objects) directly Hilbert space without explicit feature representations space. first provably O(log for exact using...
Abstract The problem of max‐kernel search arises everywhere: given a query point \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$p_q$ \end{document} , set reference objects \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$S_r$ and some kernel...
Motivated by fundamental applications in databases and relational machine learning, we formulate study the problem of answering functional aggregate queries (FAQ) which some input factors are defined a collection additive inequalities between variables. We refer to these as FAQ-AI for short. To answer Boolean semiring, define relaxed tree decompositions submodular fractional hypertree width parameters. show that an extension InsideOut algorithm using Chazelle's geometric data structure...
While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular approaches can dominate data dimensionality increases. Recent work non-interactive algorithms shows that approximate solutions for be obtained efficiently with only single round among machines. However, this approximation degenerates the number machines In paper, building recent optimal weighted average method, we introduce new...
Motivated by fundamental applications in databases and relational machine learning, we formulate study the problem of answering functional aggregate queries (FAQ) which some input factors are defined a collection additive inequalities between variables. We refer to these as FAQ-AI for short. To answer Boolean semiring, define relaxed tree decompositions submodular fractional hypertree width parameters. show that an extension InsideOut algorithm using Chazelle’s geometric data structure...
This paper is an effort to help prevent broiler chicken mortality caused by stressful conditions. We assume a relation between vocalizations and stress; therefore, microphones were used monitor flock of birds over the course their lifetime (approximately 65 days). A noise removal method based on spectral oversubtraction was developed filter out significant fan heater shown be very effective. Then, radar processing technique employed count number vocalizations. It found that effective for...
Despite the importance of sparse matrices in numerous fields science, software implementations remain difficult to use for non-expert users, generally requiring understanding underlying details chosen matrix storage format. In addition, achieve good performance, several formats may need be used one program, explicit selection and conversion between formats. This can both tedious error-prone, especially users. Motivated by these issues, we present a user-friendly open-source class C++...
As the size of datasets used in statistical learning continues to grow, distributed training models has attracted increasing attention. These methods partition data and exploit parallelism reduce memory runtime, but suffer increasingly from communication costs as or number iterations grows. Recent work on linear shown that a surrogate likelihood can be optimized locally iteratively improve an initial solution communication-efficient manner. However, existing versions these experience...
Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree for use with different trees and problems is often complex burdensome. We introduce four-part logical split: the tree, traversal, point-to-point base case, pruning rule. provide meta-algorithm which allows development in tree-independent manner easy extension to entirely new types trees. Representations provided five common algorithms; k-nearest neighbor search, this leads novel,...
Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling often done via Gaussian mixture models (GMMs), which use computationally expensive potentially unstable training algorithms. We provide an overview fast robust implementation GMMs the C++ language, employing multi-threaded versions Expectation Maximisation (EM) k-means Multi-threading achieved through reformulation EM algorithms into...