David P. Hofmeyr

ORCID: 0000-0003-3068-8128
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Clustering Algorithms Research
  • Face and Expression Recognition
  • Remote-Sensing Image Classification
  • Bayesian Methods and Mixture Models
  • Statistical Methods and Inference
  • Gaussian Processes and Bayesian Inference
  • Advanced Statistical Methods and Models
  • Soil Geostatistics and Mapping
  • Neural Networks and Applications
  • Statistical Methods and Bayesian Inference
  • Data Management and Algorithms
  • Machine Learning and Algorithms
  • Complex Network Analysis Techniques
  • Remote Sensing in Agriculture
  • Bayesian Modeling and Causal Inference
  • Constraint Satisfaction and Optimization
  • Machine Learning and Data Classification
  • Data Mining Algorithms and Applications
  • Image and Signal Denoising Methods
  • Control Systems and Identification
  • Blind Source Separation Techniques
  • Time Series Analysis and Forecasting
  • Soil and Unsaturated Flow
  • Soil and Land Suitability Analysis
  • Evolutionary Algorithms and Applications

Lancaster University
2015-2024

Stellenbosch University
2018-2023

In digital soil mapping (DSM), maps are usually produced in a univariate manner, that is, each map is independently and therefore, when multiple properties mapped the underlying dependence structure between these ignored. This may lead to inconsistent predictions simulations. For example, organic carbon (SOC) total nitrogen (TN) show unrealistic carbon–nitrogen (C:N) ratios. last decade production of with machine learning models has become increasingly popular as able capture complex...

10.1016/j.geoderma.2023.116365 article EN cc-by Geoderma 2023-02-04

This paper presents a two-stage maximum likelihood framework to deal with measurement errors in digital soil mapping (DSM) when using machine learning (ML) model. The is implemented random forest and projection pursuit regression illustrate two different areas of learning, i.e. ensemble trees feature-learning. In our proposed framework, error variance (MEV) incorporated as weight the log-likelihood function so that measurements larger MEV receive less ML model calibrated. We evaluate...

10.1016/j.spasta.2021.100572 article EN cc-by Spatial Statistics 2021-12-16

Minimum normalised graph cuts are highly effective ways of partitioning unlabeled data, having been made popular by the success spectral clustering. This work presents a novel method for learning hyperplane separators which minimise this cut objective, when data embedded in Euclidean space. The optimisation problem associated with proposed can be formulated as sequence univariate subproblems, optimal orthogonal to given vector is determined. These subproblems solved log-linear time,...

10.1109/tpami.2016.2609929 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2016-09-15

Associating distinct groups of objects (clusters) with contiguous regions high probability density (high-density clusters), is central to many statistical and machine learning approaches the classification unlabelled data. We propose a novel hyperplane classifier for clustering semi-supervised which motivated by this objective. The proposed minimum minimises integral empirical function along it, thereby avoiding intersection clusters. show that maximum margin hyperplanes are asymptotically...

10.5555/2946645.3053438 article EN Journal of Machine Learning Research 2016-01-01

The notion of cluster ability is often used to determine how strong the structure within a set data is, as well assess quality clustering model. In multivariate applications, however, can be obscured by irrelevant or noisy features. We study problem finding low dimensional projections which maximise set. particular, we seek representations binary partition. use this bi-partitioning recursively generate high models. illustrate improvement over standard dimension reduction and techniques,...

10.1109/ssci.2015.116 article EN IEEE Symposium Series on Computational Intelligence 2015-12-01

10.1016/j.csda.2020.106974 article EN Computational Statistics & Data Analysis 2020-04-13

Associating distinct groups of objects (clusters) with contiguous regions high probability density (high-density clusters), is central to many statistical and machine learning approaches the classification unlabelled data. We propose a novel hyperplane classifier for clustering semi-supervised which motivated by this objective. The proposed minimum minimises integral empirical function along it, thereby avoiding intersection clusters. show that maximum margin hyperplanes are asymptotically...

10.48550/arxiv.1507.04201 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Abstract In digital soil mapping, modelling thickness poses a challenge due to the prevalent issue of right‐censored data. This means that true exceeds depth sampling, and neglecting account for censored nature data can lead poor model performance underestimation thickness. Survival analysis is well‐established domain statistical deal with The random survival forest notable example survival‐related machine learning approach used address property in mapping. Previous studies employed this...

10.1111/ejss.13589 article EN cc-by European Journal of Soil Science 2024-09-01

This paper presents new methodology for computationally efficient kernel density estimation. It is shown that a large class of kernels allows exact evaluation the estimates using simple recursions. The same can be used to compute derivative exactly. Given an ordered sample computational complexity linear in size. Combining proposed with existing approximation methods results extremely fast Extensive experimentation documents effectiveness and efficiency this approach compared state-of-the-art.

10.1109/tpami.2019.2930501 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-07-23

Spectral clustering is a popular and versatile method based on relaxation of the normalised graph cut objective. Despite its popularity, however, there no single agreed upon for tuning important scaling parameter, nor determining automatically number clusters to extract. Popular heuristics exist, but corresponding theoretical results are scarce. In this paper we investigate asymptotic value an increasing sample assumed arise from underlying probability distribution, result provide...

10.1080/10618600.2019.1593180 article EN Journal of Computational and Graphical Statistics 2019-03-19

In digital soil mapping (DSM) maps are usually produced in a univariate manner, that is, each map is independently and therefore, when multiple properties mapped the underlying dependence structure between these ignored. This may lead to inconsistent predictions simulations. For example, organic carbon total nitrogen show unrealistic carbon-nitrogen ratios. last decade production of with machine learning models has become increasingly popular as able capture complex non-linear relationships...

10.2139/ssrn.4240513 article EN SSRN Electronic Journal 2022-01-01

In the Naive Bayes classification model class conditional densities are estimated as products of their marginal along cardinal basis directions. We study problem obtaining an alternative for this factorisation with objective enhancing discriminatory power associated model. formulate a projection pursuit to find optimal linear on which perform classification. Optimality is determined based multinomial likelihood within probabilities using projected data. Projection offers added benefits...

10.48550/arxiv.2409.05635 preprint EN arXiv (Cornell University) 2024-09-09

This paper introduces the R package FKSUM, which offers fast and exact evaluation of univariate kernel smoothers. The main computations are implemented in C++, wrapped simple, intuitive versatile functions. based on recursive expressions involving order statistics, allows for smoothers at all sample points log-linear time. In addition to general purpose smoothing functions, built readyto-use implementations popular kernel-type estimators. On top these basic problems, this focuses projection...

10.18637/jss.v101.i03 article EN cc-by Journal of Statistical Software 2022-01-01

This paper investigates the model degrees of freedom in k-means clustering. An extension Stein's lemma provides an expression for effective model. Approximating practice requires simplifications this expression, however empirical studies evince appropriateness our proposed approach. The practical relevance new formulation is demonstrated through selection using Bayesian Information Criterion. reliability method validated experiments on simulated data as well a large collection publicly...

10.48550/arxiv.1806.02034 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We propose a projection pursuit method based on semi-supervised spectral connectivity. The index is given by the second eigenvalue of graph Laplacian projected data. An incomplete label set used to modify pairwise similarities between data in such way that penalises projections which do not admit separation classes (within training data). show global optimum proposed problem converges Transductive Support Vector Machine solution, as scaling parameter reduced zero. evaluate performance benchmark sets.

10.1109/robomech.2015.7359523 article EN 2015-11-01

<p>Digital soil mapping (DSM) may be defined as the use of a statistical model to quantify relationship between certain observed property at various geographic locations, and collection environmental covariates, then using this predict locations where was not measured. It is also important uncertainty with regards prediction these maps. An source in DSM measurement error which considered difference measured true value property.</p><p>The machine...

10.5194/egusphere-egu21-9704 article EN 2021-03-04
Coming Soon ...