John Giorgi

ORCID: 0000-0001-9621-5046
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Biomedical Text Mining and Ontologies
  • Bioinformatics and Genomic Networks
  • Advanced Text Analysis Techniques
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Genetics, Bioinformatics, and Biomedical Research
  • Computational Drug Discovery Methods
  • Archaeology and ancient environmental studies
  • Microbial Metabolic Engineering and Bioproduction
  • AI in Service Interactions
  • Global Public Health Policies and Epidemiology
  • Scientific Computing and Data Management
  • Legume Nitrogen Fixing Symbiosis
  • Plant-Microbe Interactions and Immunity
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Archaeological Research and Protection
  • Explainable Artificial Intelligence (XAI)
  • Machine Learning in Materials Science
  • Geology and Paleoclimatology Research
  • Mycorrhizal Fungi and Plant Interactions
  • Pleistocene-Era Hominins and Archaeology
  • Machine Learning and Data Classification

University of Toronto
2018-2023

Vector Institute
2021-2023

Donnelly College
2021-2023

University of Ottawa
2018

University of New Brunswick
2018

John Giorgi, Osvald Nitski, Bo Wang, Gary Bader. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.72 article EN cc-by 2021-01-01

Summary Arbuscular mycorrhizal fungi ( AMF ) are known to improve plant fitness through the establishment of symbioses. Genetic and phenotypic variations among closely related isolates can significantly affect growth, but genomic changes underlying this variability unclear. To address issue, we improved genome assembly gene annotation model strain Rhizophagus irregularis DAOM 197198, compared its content with five R . sampled in same field. All harbor striking variations, large numbers...

10.1111/nph.14989 article EN publisher-specific-oa New Phytologist 2018-01-22

The explosive increase of biomedical literature has made information extraction an increasingly important tool for research. A fundamental task is the recognition named entities in text (BNER) such as genes/proteins, diseases and species. Recently, a domain-independent method based on deep learning statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), been shown to outperform state-of-the-art entity-specific BNER tools. However, this...

10.1093/bioinformatics/bty449 article EN cc-by Bioinformatics 2018-05-29

Automatic biomedical named entity recognition (BioNER) is a key task in information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with recent focus on deep learning. However, work suggested that the high performance of CRFs for may not generalize to corpora other than one it was trained on. In our analysis, we find popular learning-based approach BioNER, known as bidirectional long short-term...

10.1093/bioinformatics/btz504 article EN cc-by-nc Bioinformatics 2019-06-17

Sentence embeddings are an important component of many natural language processing (NLP) systems. Like word embeddings, sentence typically learned on large text corpora and then transferred to various downstream tasks, such as clustering retrieval. Unlike the highest performing solutions for learning require labelled data, limiting their usefulness languages domains where data is abundant. In this paper, we present DeCLUTR: Deep Contrastive Learning Unsupervised Textual Representations....

10.48550/arxiv.2006.03659 preprint EN public-domain arXiv (Cornell University) 2020-01-01

Motivated by the fact that many relations cross sentence boundary, there has been increasing interest in document-level relation extraction (DocRE). DocRE requires integrating information within and across sentences, capturing complex interactions between mentions of entities. Most existing methods are pipeline-based, requiring entities as input. However, jointly learning to extract can improve performance be more efficient due shared parameters training steps. In this paper, we develop a...

10.18653/v1/2022.bionlp-1.2 article EN cc-by 2022-01-01

This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results two approaches: first fine-tunes a pre-trained language model (PLM) on data, and second uses few-shot in-context learning (ICL) with large (LLM). Both achieve high performance as measured by metrics (e.g. ROUGE, BERTScore) ranked first, respectively, of all submissions task. Expert human scrutiny indicates that notes generated via...

10.18653/v1/2023.clinicalnlp-1.36 article EN cc-by 2023-01-01

Named entity recognition (NER) and relation extraction (RE) are two important tasks in information retrieval (IE \& IR). Recent work has demonstrated that it is beneficial to learn these jointly, which avoids the propagation of error inherent pipeline-based systems improves performance. However, state-of-the-art joint models typically rely on external natural language processing (NLP) tools, such as dependency parsers, limiting their usefulness domains (e.g. news) where those tools perform...

10.48550/arxiv.1912.13415 preprint EN public-domain arXiv (Cornell University) 2019-01-01

Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other sources support data analysis interpretation. Here we describe Biofactoid, a web-based platform that allows scientists specify networks interactions between genes, their products, chemical compounds, then translates into representation suitable for computational analysis, search discovery. We also...

10.7554/elife.68292 article EN cc-by eLife 2021-12-03

Abstract Motivation The explosive increase of biomedical literature has made information extraction an increasingly important tool for research. A fundamental task is the recognition named entities in text (BNER) such as genes/proteins, diseases, and species. Recently, a domain-independent method based on deep learning statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), been shown to outperform state-of-the-art entity-specific BNER tools....

10.1101/262790 preprint EN public-domain bioRxiv (Cold Spring Harbor Laboratory) 2018-02-12

Motivation: Automatic biomedical named entity recognition (BioNER) is a key task in information extraction (IE). For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with recent focus on deep learning. However, work suggested that the high performance of CRFs for may not generalize to corpora other than one it was trained on. In our analysis, we find popular learning-based approach BioNER, known as bidirectional...

10.1101/526244 preprint EN public-domain bioRxiv (Cold Spring Harbor Laboratory) 2019-01-22

Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input. In practice, this document is not always available; it would need to be retrieved given an information need, i.e. question or topic statement, setting we dub "open-domain" MDS. We study more challenging by formalizing the task and bootstrapping using existing datasets, retrievers summarizers. Via extensive automatic human evaluation, determine: (1) state-of-the-art summarizers suffer large...

10.48550/arxiv.2212.10526 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input. In practice, this document is not always available; it would need to be retrieved given an information need, i.e. question or topic statement, setting we dub “open-domain’ MDS. We study more challenging by formalizing the task and bootstrapping using existing datasets, retrievers summarizers. Via extensive automatic human evaluation, determine: (1) state-of-the-art summarizers suffer large...

10.18653/v1/2023.findings-emnlp.549 article EN cc-by 2023-01-01

Motivated by the fact that many relations cross sentence boundary, there has been increasing interest in document-level relation extraction (DocRE). DocRE requires integrating information within and across sentences, capturing complex interactions between mentions of entities. Most existing methods are pipeline-based, requiring entities as input. However, jointly learning to extract can improve performance be more efficient due shared parameters training steps. In this paper, we develop a...

10.48550/arxiv.2204.01098 preprint EN public-domain arXiv (Cornell University) 2022-01-01

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic would enable their rapid curation as resources, providing alternative to traditional web search. While most prior work has focused on generating biographical entities, in this work, we develop completely automated process generate high-quality for scientific with focus biomedical concepts. We release TOPICAL, app associated open-source code,...

10.48550/arxiv.2405.01796 preprint EN arXiv (Cornell University) 2024-05-02

ABSTRACT Technological advances in computing provide major opportunities to accelerate scientific discovery. The wide availability of structured knowledge would allow us take full advantage these by enabling efficient human-computer interaction. Traditionally, biological is captured publications and bases, however, the information articles not directly accessible computers, bases are constrained finite resources available for manual curation. To capture communication keep pace with rapid...

10.1101/2021.03.10.382333 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-03-11

Technological advances in computing provide major opportunities to complement human reasoning and dramatically speed up science - but only if structured knowledge is available enable efficient communication between humans computers. Traditionally, biological captured publications bases. Knowledge papers not directly a computable, form; curated bases are limited by manual curation processes. To accelerate capture keep pace with the rapid growth of scientific reports, we developed Biofactoid...

10.31219/osf.io/zep3x preprint EN 2020-11-20

Identifying subcellular biological entities (genes, gene products, and small molecules) is essential in using creating bioinformatics analysis tools, text mining, accessible research apps.When information uniquely unambiguously identified, it enables data to be accurately retrieved, cross-referenced, integrated.In practice, are identified when they associated with a matching record from knowledge base that specialises collecting organising of type (e.g.genes NCBI Gene).Our search service...

10.21105/joss.03756 article EN cc-by The Journal of Open Source Software 2021-11-12

The quest for human imitative AI has been an enduring topic in research since its inception. technical evolution and emerging capabilities of the latest cohort large language models (LLMs) have reinvigorated subject beyond academia to cultural zeitgeist. While recent NLP evaluation benchmark tasks test some aspects human-imitative behaviour (e.g., BIG-bench's 'human-like behavior' tasks), few, if not none, examine creative problem solving abilities. Creative humans is a well-studied...

10.48550/arxiv.2306.11167 preprint EN cc-by arXiv (Cornell University) 2023-01-01

This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results two approaches: first fine-tunes a pre-trained language model (PLM) on data, and second uses few-shot in-context learning (ICL) with large (LLM). Both achieve high performance as measured by metrics (e.g. ROUGE, BERTScore) ranked first, respectively, of all submissions task. Expert human scrutiny indicates that notes generated via...

10.48550/arxiv.2305.02220 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The interaction of ‘natural’ environmental processes and human activity in shaping landscapes is vividly illustrated the Lower Thames Valley, UK. Through development-led (geo)archaeological investigations, intensifying redevelopment this (currently) industrial landscape presents opportunities to gain long-term perspectives on these investigate timing extent impact environment past. This paper describes a novel multi-method approach undertaken at former Littlebrook Power Station, Kent,...

10.1080/14662035.2021.2042050 article EN Landscapes 2021-07-03
Coming Soon ...