- Topic Modeling
- Natural Language Processing Techniques
- Computational Drug Discovery Methods
- Biomedical Text Mining and Ontologies
- Machine Learning in Healthcare
- Sentiment Analysis and Opinion Mining
- Pharmacovigilance and Adverse Drug Reactions
- Semantic Web and Ontologies
- Bioinformatics and Genomic Networks
- Software Engineering Research
- Gene expression and cancer classification
- Advanced Text Analysis Techniques
- Molecular Biology Techniques and Applications
- Multi-Agent Systems and Negotiation
- Medical Coding and Health Information
- Mental Health via Writing
- Evolutionary Psychology and Human Behavior
- Intelligent Tutoring Systems and Adaptive Learning
- Chaos-based Image/Signal Encryption
- Machine Learning and Data Classification
- Advanced Steganography and Watermarking Techniques
- Mathematics Education and Pedagogy
- English Language Learning and Teaching
- Neuroscience and Music Perception
- Multimodal Machine Learning Applications
University of Edinburgh
2023-2025
Epigénétique et Destin Cellulaire
2022-2023
Medigene (Germany)
2022
Binus University
2017-2021
Abstract Background Variability in datasets is not only the product of biological processes: they are also technical biases. ComBat and ComBat-Seq among most widely used tools for correcting those biases, called batch effects, in, respectively, microarray RNA-Seq expression data. Results In this note, we present a new Python implementation ComBat-Seq. While mathematical framework strictly same, show here that our implementations: (i) have similar results terms effects correction; (ii) as...
Abstract Objectives The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification Diseases (ICD)-10 codes for data augmentation on low-resource labels. Materials Methods Employing we generated coded 9606 discharge summaries based lists ICD-10 code descriptions patients infrequent (or generation) within the MIMIC-IV dataset. Combined baseline training set, formed an augmented set. Neural models were trained evaluated test We...
The impressive performance of modern Large Language Models (LLMs) across a wide range tasks, along with their often non-trivial errors, has garnered unprecedented attention regarding the potential AI and its impact on everyday life. While considerable effort been continues to be dedicated overcoming limitations current models, potentials risks human-LLM collaboration remain largely underexplored. In this perspective, we argue that enhancing focus interaction should primary target for future...
Understanding time from visual representations is a fundamental cognitive skill, yet it remains challenge for multimodal large language models (MLLMs). In this work, we investigate the capabilities of MLLMs in interpreting and date through analogue clocks yearly calendars. To facilitate this, curated structured dataset comprising two subsets: 1) $\textit{ClockQA}$, which comprises various types clock styles$-$standard, black-dial, no-second-hand, Roman numeral, arrow-hand clocks$-$paired...
Large language models (LLMs) remain prone to factual inaccuracies and computational errors, including hallucinations mistakes in mathematical reasoning. Recent work augmented LLMs with tools mitigate these shortcomings, but often requires curated gold tool-use demonstrations. In this paper, we investigate whether can learn use without First, analyse zero-shot prompting strategies guide tool utilisation. Second, propose a self-training method synthesise traces using the LLM itself. We compare...
Abstract Background Variability in datasets is not only the product of biological processes: they are also technical biases. ComBat and ComBat-Seq among most widely used tools for correcting those biases, called batch effects, in, respectively, microarray RNA-Seq expression data. Results In this note, we present a new Python implementation ComBat-Seq. While mathematical framework strictly same, show here that our implementations: ( i ) have similar results terms effects correction; ii as...
Large Language Models (LLMs) have transformed the Natural Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone ``hallucinations'' -- outputs that do not align factual reality or input context. This paper introduces Hallucinations Leaderboard, an open initiative quantitatively measure compare tendency of each model produce hallucinations. The leaderboard uses a comprehensive set benchmarks focusing on different...
Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these can induce hallucinations and contrasting of base LLM masked reduce hallucinations. To this end, we...
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, parametric may conflict with the information provided context -- this phenomenon, known as \emph{context-memory conflicts}, lead to undesirable model behaviour, such reliance on outdated or incorrect information. Analysing internal activations LLMs, we find that they internally register signals at mid-layers. Such allow us detect whether occurs and use \emph{inference-time}...
Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven be impractical owing the substantial computational requirements associated with training large models. To address issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning small subset additional parameters, significantly reducing for domain...
Abstract Argumentation mining is a research field which focuses on sentences in type of argumentation. Argumentative are often used daily communication and have important role each decision or conclusion making process. The objective to do observation deep learning utilization combined with attention mechanism for argument annotation analysis. Argument component classification from certain discourse several classes. Classes include major claim, premise non-argumentative. analysis points...
We propose a novel approach in dataset of argumentation relations. This task is intended to analyze the presence support relation between two sentences. To be able identify relations sentences or arguments, one obliged understand nuance brought by both Our models are modification siamese network architectures, which we replace feature extractor into Long Short Term Memory and implement cosine distance as energy function. take pair their input try whether there those not.The primary...
In our current time, the well-being of a person is not only determined by physical health, but also their mental health. A lot focus and effort have been spent into raising awareness this issue. One such comes from field computer science utilizing data social media to provide additional information in detecting these disorders. research, authors proposed Bidirectional Encoder Representations Transformers (BERT) with extractive summarization preprocess obtained popular platform as Reddit...
Facial attractiveness classification application has many various usabilities, including photo editing, beautification, grading and dataset labeling. While face seems to be related personal preference, building a robust classifier is not impossible. There are several studies that have developed system of facial using convolutional neural network provide satisfactory results. The use Image-net pre-trained been largely used by face-related research, yet none them attractiveness. This study...
Predicting personality is a growing topic in the field of natural language processing. The study prediction has been proven to benefit development recommender systems and automated assessments by previous studies. Additionally, widespread usage social media Indonesia such as Twitter served potential source data for developing models. Existing models explored implementation both traditional machine learning deep models, with latter perform better more data. Despite so, there not much...
Objective: To investigate GPT-3.5 in generating and coding medical documents with ICD-10 codes for data augmentation on low-resources labels. Materials Methods: Employing we generated coded 9,606 discharge summaries based lists of code descriptions patients infrequent (generation) within the MIMIC-IV dataset. Combined baseline training set, this formed an augmented set. Neural models were trained evaluated a test We report micro- macro-F1 scores full codeset, generation codes, their...
Mathematical reasoning remains a significant challenge for large language models (LLMs), despite progress in prompting techniques such as Chain-of-Thought (CoT). We present Chain of Mathematically Annotated Thought (CoMAT), which enhances through two stages: Symbolic Conversion (converting natural queries into symbolic form) and Reasoning Execution (deriving answers from representations). CoMAT operates entirely with single LLM without external solvers. Across four LLMs, outperforms...
Abstract Accurate in-silico prediction of protein-ligand binding affinity is essential for efficient hit identification in large molecular libraries. Commonly used structure-based methods such as giga-docking often fail to rank compounds effectively, and free energy-based approaches, while accurate, are too computationally intensive large-scale screening. Existing deep learning models struggle generalize new targets or drugs, current evaluation do not reflect real-world performance...
Automatic Music Transcription (AMT) is becoming more and popular throughout the day, it has piqued interest of many in addition to academic research. A successful AMT system would be able bridge multiple ranges interactions between people music, including music education. The goal this research transcribe an audio input notation. Research methods were conducted by training neural networks architectures different kinds cases. evaluation used two approaches, those objective subjective...
The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports. In this study, we evaluate various Large Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Parameter-Efficient Fine-Tuning (PEFT). We propose a PEFT method to improve the consistency of LLMs by merging adapters that were fine-tuned separately using triplet language modelling objectives. found two...