- Topic Modeling
- Biomedical Text Mining and Ontologies
- Machine Learning in Healthcare
- Advanced Decision-Making Techniques
- Natural Language Processing Techniques
- Digital Media Forensic Detection
- Evaluation and Optimization Models
- Machine Learning in Bioinformatics
- Genetics, Bioinformatics, and Biomedical Research
- AI in cancer detection
- Evaluation Methods in Various Fields
- Machine Learning and Algorithms
- Video Analysis and Summarization
- Artificial Intelligence in Law
- Privacy-Preserving Technologies in Data
- Advanced Steganography and Watermarking Techniques
- Radiomics and Machine Learning in Medical Imaging
- Computational Drug Discovery Methods
- Thermal and Kinetic Analysis
- Adversarial Robustness in Machine Learning
- Advanced Text Analysis Techniques
- Artificial Intelligence in Healthcare
- Imbalanced Data Classification Techniques
- vaccines and immunoinformatics approaches
- Protein Structure and Dynamics
Deakin University
2022-2025
University of Arizona
2025
Oak Ridge National Laboratory
2017-2024
Inner Mongolia Normal University
2023-2024
Dalian Maritime University
2024
Beijing Electronic Science and Technology Institute
2023
China Southern Power Grid (China)
2018
Beijing University of Chemical Technology
2018
Wuhan University
2012-2015
University of Calgary
2009-2013
Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods scale BERT, which by default can only handle input sequences up approximately 400 words long, perform several thousand long. We compare these against two much simpler architectures - a word-level...
We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts free-text documents.Data our analyses were obtained 942 deidentified collected by the National Cancer Institute Surveillance, Epidemiology, End Results program. The HAN was implemented 2...
We examine the problem of clustering biomolecular simulations using deep learning techniques. Since simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used extract quantitative insights into atomistic mechanisms underlie complex biological processes. use a convolutional variational autoencoder (CVAE) learn biophysically relevant latent features from long time-scale protein folding in an unsupervised manner....
Abstract Objective We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. show the importance related information (IE) tasks leveraging shared representations across achieve state-of-the-art performance classification accuracy computational efficiency. Materials Methods Multitask CNN...
In the last decade, widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly extraction tasks from unstructured clinical notes. Disparities in performance when deploying models real world have recently received considerable attention. NLP domain, robustness convolutional neural networks (CNNs) classifying cancer pathology reports...
Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety scientific and engineering disciplines. For given problem, suitable genome representation must be defined along with mutation operator subsequent generations. Unlike natural which display complex rearrangements (e.g., mobile elements), commonly utilizes only random pointwise changes. Furthermore, generalizing beyond mutations poses key difficulty...
Recent work in machine translation has demonstrated that self-attention mechanisms can be used place of recurrent neural networks to increase training speed without sacrificing model accuracy. We propose combining this approach with the benefits convolutional filters and a hierarchical structure create document classification is both highly accurate fast train – we name our method Hierarchical Convolutional Attention Networks. demonstrate effectiveness architecture by surpassing accuracy...
We introduce a deep learning architecture, hierarchical self-attention networks (HiSANs), designed for classifying pathology reports and show how its unique architecture leads to new state-of-the-art in accuracy, faster training, clear interpretability. evaluate performance on corpus of 374,899 obtained from the National Cancer Institute's (NCI) Surveillance, Epidemiology, End Results (SEER) program. Each report is associated with five clinical classification tasks - site, laterality,...
Named entity recognition (NER) is a key component of many scientific literature mining tasks, such as information retrieval, extraction, and question answering; however, modern approaches require large amounts labeled training data in order to be effective. This severely limits the effectiveness NER models applications where expert annotations are difficult expensive obtain. In this work, we explore transfer learning semi-supervised self-training improve performance biomedical settings with...
Population cancer registries can benefit from Deep Learning (DL) to automatically extract characteristics the high volume of unstructured pathology text reports they process annually. The success DL tackle this and other real-world problems is proportional availability large labeled datasets for model training. Although collaboration among essential fully exploit promise DL, privacy confidentiality concerns are main obstacles data sharing across registries. Moreover, natural language...
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design novel protein targets. We leverage deep learning language models generate score candidates based on predicted binding affinity. pre-trained a model (BERT) ∼9.6 billion molecules achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days hours, compared previous efforts with this architecture, while also increasing dataset size by nearly an...
Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep models is often difficult expensive. Active techniques may mitigate this challenge by reducing amount of required to effectively train a model. In study, we analyze effectiveness 11 active algorithms on classifying subsite histology from cancer pathology reports using Convolutional Neural Network as model.We compare performance each...
Attention mechanisms are now a mainstay architecture in neural networks and improve the performance of biomedical text classification tasks. In particular, models that perform automated medical encoding clinical documents make extensive use label-wise attention mechanism. A mechanism increases model's discriminatory ability by using label-specific reference information. This information can either be implicitly learned during training or explicitly provided through embedded textual code...
Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from in near real time, we explore the impact of using domain-specific transformer models that understand reports.
Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, single patient may generate multiple over the trajectory disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual but also capture aggregate regarding entire case based off case-level context all in sequence. this paper, we introduce simple modular add-on for capturing that designed be compatible with most existing...
Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, reliability of models production settings has been hindered by imbalanced data sets which a small subset classes dominate. In absence adequate training data, rare necessitate additional model constraints robust performance. Here, we present strategy incorporating short sequences (i.e. keywords) into to boost accuracy on classes. our approach, assemble set...
Automated text information extraction from cancer pathology reports is an active area of research to support national surveillance. A well-known challenge how develop tools with robust performance across registries. In this study we investigated whether transfer learning (TL) a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, performed series experiments determine CNN trained single-registry data capable transferring another registry or...