- scientometrics and bibliometrics research
- Complex Network Analysis Techniques
- Expert finding and Q&A systems
- Mosquito-borne diseases and control
- Topic Modeling
- Natural Language Processing Techniques
- Text and Document Classification Technologies
- Viral Infections and Vectors
- Advanced Text Analysis Techniques
- Software Engineering Research
- Genomics and Phylogenetic Studies
- Machine Learning in Bioinformatics
- Information Retrieval and Search Behavior
- Dengue and Mosquito Control Research
- Software Engineering Techniques and Practices
- Virology and Viral Diseases
- Neurobiology of Language and Bilingualism
- Data Visualization and Analytics
- Web Data Mining and Analysis
- Insect-Plant Interactions and Control
- COVID-19 epidemiological studies
- Scientific Computing and Data Management
- Web visibility and informetrics
- Educational and Psychological Assessments
- Software System Performance and Reliability
Stellenbosch University
2012-2025
National Research University Higher School of Economics
2015
Summary Oropouche virus (OROV) is an emerging arbovirus with increasing outbreaks in South America, yet its environmental drivers and potential range remain poorly understood. Using ecological niche modeling (ENM) random forests, we assessed the suitability of OROV primary vector, Culicoides paraensis , across Brazil Americas. We evaluated five pseudo-absence sampling techniques, considering ratios, buffer radii, density smoothing factors to determine most effective approach. Key predictors...
Abstract In March 2024, the Pan American Health Organization (PAHO) issued an alert in response to a rapid increase Oropouche fever cases across South America. Brazil has been particularly affected, reporting novel reassortant lineage of virus (OROV) and expansion previously non-endemic areas beyond Amazon Basin. Utilising phylogeographic approaches, we reveal multi-scale process with both short long-distance dispersal events, diffusion velocities line human-mediated jumps. We identify...
The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical nomenclature system enhances spatial resolution by defining and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools - Genome Detective, GLUE, NextClade rely on computationally intensive sequence alignment phylogenetic inference, machine learning presents promising alternative for achieving accurate rapid...
The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for comparison. Alignment-free (AF) methods offer a scalable alternative to traditional alignment-based approaches such as BLAST. This study evaluates alignment-free and alternatives viral classification, focusing on identifying techniques that maintain high accuracy efficiency when applied extremely large datasets. We employed six established AF...
The research presented in this paper focuses on comparing and evaluating various ranking algorithms that can be used citation graphs order to rank individual papers according their importance relevance. graph analysis investigated are PageRank, CiteRank an algorithm proposed by Hwang et al. compared the method of simply counting number citations a publication. In addition, new algorithm, NewRank, is which combination PageRank with focus identifying influential were published recently. A...
With the goal of helping software engineering researchers understand how to improve their papers, Mary Shaw presented "Writing Good Software Engineering Research Papers" in 2003. analyzed abstracts papers submitted 2002 International Conference (ICSE) determine trends research question type, contribution and validation approach. We revisit Shaw's work see community has evolved since 2002. The this paper is aid understanding design, approach by analyzing ICSE 2016. implemented recommendation...
Abstract Background The rapid increase in nucleotide sequence data generated by next-generation sequencing (NGS) technologies demands efficient computational tools for comparison. Alignment-based methods, such as BLAST, are increasingly overwhelmed the scale of contemporary datasets due to their high classification. This study evaluates alignment-free (AF) methods scalable and alternatives viral classification, focusing on identifying techniques that maintain accuracy efficiency when applied...
With the goal of helping software engineering researchers understand how to improve their papers, Mary Shaw presented "Writing Good Software Engineering Research Papers" in 2003. analyzed abstracts papers submitted 2002 International Conference (ICSE) determine trends research question type, contribution and validation approach. We revisit Shaw's work see community has evolved since 2002. The this paper is aid understanding design, approach by analyzing ICSE 2016. implemented recommendation...
The INFORM-Africa Consortium, a research hub of the NIH-funded DS-I Africa, will leverage Data Management and Analysis Core (DMAC) Next Generation Sequencing (NGS) to ensure effective data management analysis.The DMAC capture analyse data, making it accessible collaborators across multiple African countries future hubs.The aim is increase access high-quality, reproducible that can be used engage policymakers better prepare for pandemics, while also removing barriers sharing integration...
Abstract Background Dengue is a significant global public health concern that poses threat to Africa. Particularly, African countries are at risk of viral introductions through air travel connectivity with areas South America and Asia experience frequent explosive outbreaks. Limited reporting diagnostic capacity hinder comprehensive assessment continent-wide transmission dynamics deployment surveillance strategies in This study aimed identify airports high receiving dengue infected...
Hierarchical Text Classification (HTC) is a natural language processing task with the objective to classify text documents into set of classes from structured class hierarchy. Many HTC approaches have been proposed which attempt leverage hierarchy information in various ways improve classification performance. Machine learning-based require large amounts training data and are most-commonly compared through three established benchmark datasets, include Web Of Science (WOS), Reuters Corpus...
Abstract Hierarchical text classification (HTC) is a natural language processing task which aims to categorise document into set of classes from hierarchical class structure. Recent approaches solve HTC tasks focus on leveraging pre-trained models (PLMs) and the structure by allowing these components interact in various ways. Specifically, Hierarchy-aware Prompt Tuning (HPT) method has proven be effective applying prompt tuning paradigm Bidirectional Encoder Representations Transformers...
To address the gap in natural language processing for Southern African languages, our paper presents an in-depth analysis of model development under resource-constrained conditions. We investigate interplay between size, pretraining objectives, and multilingual dataset composition context low-resource languages such as Zulu Xhosa. In approach, we initially pretrain models from scratch on specific using a variety configurations, incrementally add related to explore effect additional...
Methods such as the h-index and journal impact factor are commonly used by scientific community to quantify quality or of research output. These methods rely primarily on citation frequency without taking context citations into consideration. Furthermore, these weigh each equally ignoring valuable characteristics, intent sentiment. The correct classification intents sentiments can therefore be further improve scientometric metrics. In this paper we evaluate BERT for sentiment in-text ci-...