Ryan R. Curtin

ORCID: 0000-0002-9903-8214
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Parallel Computing and Optimization Techniques
  • Algorithms and Data Compression
  • Machine Learning and Data Classification
  • Advanced Image and Video Retrieval Techniques
  • Machine Learning and Algorithms
  • Numerical Methods and Algorithms
  • Adversarial Robustness in Machine Learning
  • Face and Expression Recognition
  • Distributed and Parallel Computing Systems
  • Embedded Systems Design Techniques
  • Advanced Clustering Algorithms Research
  • Neural Networks and Applications
  • Sparse and Compressive Sensing Techniques
  • Complexity and Algorithms in Graphs
  • Data Mining Algorithms and Applications
  • Network Security and Intrusion Detection
  • Stochastic Gradient Optimization Techniques
  • Music and Audio Processing
  • Advanced Malware Detection Techniques
  • Anomaly Detection Techniques and Applications
  • Automated Road and Building Extraction
  • Advanced Multi-Objective Optimization Algorithms
  • Advanced Database Systems and Queries
  • Semantic Web and Ontologies

Booz Allen Hamilton (United States)
2024

NortonLifeLock (United States)
2016-2021

Freie Universität Berlin
2018

Czech Academy of Sciences, Institute of Computer Science
2018

Data61
2017

Commonwealth Scientific and Industrial Research Organisation
2017

Georgia Institute of Technology
2011-2015

Georgia Tech Research Institute
2014

The C++ language is often used for implementing functionality that performance and/or resource sensitive. While the standard library provides many useful algorithms (such as sorting), in its current form it does not provide direct handling of linear algebra (matrix maths). Armadillo an open source linear language, aiming towards a good balance between speed and ease of use. Its high-level Application Programming Interface (API) deliberately similar to widely Matlab Octave languages...

10.21105/joss.00026 article EN cc-by The Journal of Open Source Software 2016-06-10

Deep neural networks (DNNs) are powerful nonlinear architectures that known to be robust random perturbations of the input. However, these models vulnerable adversarial perturbations--small input changes crafted explicitly fool model. In this paper, we ask whether a DNN can distinguish samples from their normal and noisy counterparts. We investigate model confidence on by looking at Bayesian uncertainty estimates, available in dropout networks, performing density estimation subspace deep...

10.48550/arxiv.1703.00410 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Modern malware typically makes use of a domain generation algorithm (DGA) to avoid command and control domains or IPs being seized sinkholed. This means that an infected system may attempt access many in contact the server. Therefore, automatic detection DGA is important task, both for sake blocking malicious identifying compromised hosts. However, DGAs English wordlists generate plausibly clean-looking names; this difficult. In work, we devise notion difficulty families called smashword...

10.1145/3339252.3339258 article EN Proceedings of the 17th International Conference on Availability, Reliability and Security 2019-08-09

MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both simple, consistent API accessible to novice users and high performance flexibility expert by leveraging modern features of C++. provides cutting-edge algorithms whose benchmarks exhibit far better than other leading libraries. version 1.0.3, licensed under the LGPL, available at http://www.mlpack.org.

10.48550/arxiv.1210.6293 preprint EN other-oa arXiv (Cornell University) 2012-01-01

For over 15 years, the mlpack machine learning library has served as a "swiss army knife'' for C++-based (Curtin et al., 2013).Its efficient implementations of common and cutting-edge algorithms have been used in wide variety scientific industrial applications.This paper overviews 4, significant upgrade its predecessor 2018).The significantly refactored redesigned to facilitate an easier prototyping-to-deployment pipeline, including bindings other languages (Python, Julia, R, Go, command...

10.21105/joss.05026 article EN cc-by The Journal of Open Source Software 2023-02-01

A major challenge in the deployment of scientific software solutions is adaptation research prototypes to production-grade code. While high-level languages like MATLAB are useful for rapid prototyping, they lack resource efficiency required scalable production applications, necessitating translation into lower level C++. Further, machine learning and signal processing underlying linear algebra primitives, generally provided by standard BLAS LAPACK libraries, unwieldy difficult use, requiring...

10.48550/arxiv.2502.03000 preprint EN arXiv (Cornell University) 2025-02-05

The wide applicability of kernels makes the problem max-kernel search ubiquitous and more general than usual similarity in metric spaces. We focus on solving this efficiently. begin by characterizing inherent hardness with a novel notion directional concentration. Following that, we present method to use an O(n log n) algorithm index any set objects (points RD or abstract objects) directly Hilbert space without explicit feature representations space. first provably O(log for exact using...

10.1137/1.9781611972832.1 article EN 2013-05-02

Abstract The problem of max‐kernel search arises everywhere: given a query point \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$p_q$ \end{document} , set reference objects \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$S_r$ and some kernel...

10.1002/sam.11218 article EN Statistical Analysis and Data Mining The ASA Data Science Journal 2014-05-13

Motivated by fundamental applications in databases and relational machine learning, we formulate study the problem of answering functional aggregate queries (FAQ) which some input factors are defined a collection additive inequalities between variables. We refer to these as FAQ-AI for short. To answer Boolean semiring, define relaxed tree decompositions submodular fractional hypertree width parameters. show that an extension InsideOut algorithm using Chazelle's geometric data structure...

10.1145/3294052.3319694 article EN 2019-06-17

While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular approaches can dominate data dimensionality increases. Recent work non-interactive algorithms shows that approximate solutions for be obtained efficiently with only single round among machines. However, this approximation degenerates the number machines In paper, building recent optimal weighted average method, we introduce new...

10.48550/arxiv.2406.01753 preprint EN arXiv (Cornell University) 2024-06-03

Motivated by fundamental applications in databases and relational machine learning, we formulate study the problem of answering functional aggregate queries (FAQ) which some input factors are defined a collection additive inequalities between variables. We refer to these as FAQ-AI for short. To answer Boolean semiring, define relaxed tree decompositions submodular fractional hypertree width parameters. show that an extension InsideOut algorithm using Chazelle’s geometric data structure...

10.1145/3426865 article EN ACM Transactions on Database Systems 2020-12-06

This paper is an effort to help prevent broiler chicken mortality caused by stressful conditions. We assume a relation between vocalizations and stress; therefore, microphones were used monitor flock of birds over the course their lifetime (approximately 65 days). A noise removal method based on spectral oversubtraction was developed filter out significant fan heater shown be very effective. Then, radar processing technique employed count number vocalizations. It found that effective for...

10.1109/globalsip.2014.7032300 article EN 2014-12-01

Despite the importance of sparse matrices in numerous fields science, software implementations remain difficult to use for non-expert users, generally requiring understanding underlying details chosen matrix storage format. In addition, achieve good performance, several formats may need be used one program, explicit selection and conversion between formats. This can both tedious error-prone, especially users. Motivated by these issues, we present a user-friendly open-source class C++...

10.3390/mca24030070 article EN cc-by Mathematical and Computational Applications 2019-07-19

As the size of datasets used in statistical learning continues to grow, distributed training models has attracted increasing attention. These methods partition data and exploit parallelism reduce memory runtime, but suffer increasingly from communication costs as or number iterations grows. Recent work on linear shown that a surrogate likelihood can be optimized locally iteratively improve an initial solution communication-efficient manner. However, existing versions these experience...

10.1145/3637528.3672038 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree for use with different trees and problems is often complex burdensome. We introduce four-part logical split: the tree, traversal, point-to-point base case, pruning rule. provide meta-algorithm which allows development in tree-independent manner easy extension to entirely new types trees. Representations provided five common algorithms; k-nearest neighbor search, this leads novel,...

10.48550/arxiv.1304.4327 preprint EN other-oa arXiv (Cornell University) 2013-01-01

Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling often done via Gaussian mixture models (GMMs), which use computationally expensive potentially unstable training algorithms. We provide an overview fast robust implementation GMMs the C++ language, employing multi-threaded versions Expectation Maximisation (EM) k-means Multi-threading achieved through reformulation EM algorithms into...

10.1109/icspcs.2017.8270510 preprint EN 2017-12-01
Coming Soon ...