NFDI4DS | UHH-SEMS - Publication Details

Waleed Ammar

ORCID: 0000-0003-3541-6981

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5102948988

Research Areas

Topic Modeling
Natural Language Processing Techniques
Biomedical Text Mining and Ontologies
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Speech and dialogue systems
Text Readability and Simplification
Speech Recognition and Synthesis
Advanced Graph Neural Networks
Handwritten Text Recognition Techniques
Text and Document Classification Technologies
Genomics and Phylogenetic Studies
Protist diversity and phylogeny
RNA and protein synthesis mechanisms
Expert finding and Q&A systems
Multimodal Machine Learning Applications
Authorship Attribution and Profiling
Pharmacogenetics and Drug Metabolism
Advanced Image and Video Retrieval Techniques
Spam and Phishing Detection
Time Series Analysis and Forecasting
Neural Networks and Applications
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Privacy, Security, and Data Protection

Google (United States)
2021-2023

University of California, Santa Barbara
2023

University of Rochester
2023

Allen Institute
2018-2020

Allen Institute for Artificial Intelligence
2017-2019

Northwestern University
2018

Carnegie Mellon University
2012-2017

Laboratoire d'Informatique de Paris-Nord
2017

Johns Hopkins University
2017

The University of Tokyo
2017

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

OPENALEX - Publications

Mark E Neumann Daniel King Iz Beltagy Waleed Ammar

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical is a critically important application area of which there are few robust, practical, publicly available models. This paper describes scispaCy, new tool practical biomedical/scientific heavily leverages the spaCy library. We detail performance two packages released scispaCy demonstrate their robustness on several...

10.18653/v1/w19-5034 preprint EN 2019-01-01

Semi-supervised sequence tagging with bidirectional language models

OPENALEX - Publications

Matthew E. Peters Waleed Ammar Chandra Bhagavatula Russell Power

Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent that operates on word-level representations to produce context sensitive is trained relatively little labeled data. In this paper, we demonstrate general semi-supervised approach adding pretrained bidirectional language models systems and apply it sequence labeling We evaluate our model two datasets named entity...

10.18653/v1/p17-1161 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

DyNet: The Dynamic Neural Network Toolkit

OPENALEX - Publications

Graham Neubig Chris Dyer Yoav Goldberg Austin Matthews Waleed Ammar and 20 more

We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of structure. In the static strategy that is used in toolkits like Theano, CNTK, and TensorFlow, user first defines computation graph (a symbolic representation computation), then examples are fed into an engine executes this computes its derivatives. DyNet's strategy, construction mostly transparent, being implicitly constructed by executing procedural code outputs, free to use different...

10.48550/arxiv.1701.03980 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Construction of the Literature Graph in Semantic Scholar

OPENALEX - Publications

Waleed Ammar Dirk Groeneveld Chandra Bhagavatula Iz Beltagy Miles Crawford and 18 more

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry...

10.18653/v1/n18-3011 article EN cc-by 2018-01-01

Massively Multilingual Word Embeddings

OPENALEX - Publications

Waleed Ammar George Mulcaire Yulia Tsvetkov Guillaume Lample Chris Dyer and 1 more

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages a single shared embedding space. Our estimation methods, multiCluster multiCCA, use dictionaries monolingual data; they do not require parallel data. evaluation method, multiQVEC-CCA, is shown to correlate better previous ones with two downstream tasks (text categorization parsing). also describe web portal that will facilitate further research this area, along open-source releases all our methods.

10.48550/arxiv.1602.01925 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Many Languages, One Parser

OPENALEX - Publications

Waleed Ammar George Mulcaire Miguel Ballesteros Chris Dyer Noah A. Smith

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The uses (i) word clusters embeddings; (ii) token-level language information; (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only effectively multiple languages, but also generalize across languages based on linguistic universals typological similarities, making more effective learn from limited annotations. Our parser’s...

10.1162/tacl_a_00109 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

Structural Scaffolds for Citation Intent Classification in Scientific Publications

OPENALEX - Publications

Arman Cohan Waleed Ammar Madeleine van Zuylen Field Cady

Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1361 article EN 2019-01-01

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

OPENALEX - Publications

Gunjan Baid Daniel E. Cook Kishwar Shafin Taedong Yun Felipe Llinares-López and 14 more

10.1038/s41587-022-01435-7 article EN Nature Biotechnology 2022-09-01

Content-Based Citation Recommendation

OPENALEX - Publications

Chandra Bhagavatula Sergey Feldman Russell Power Waleed Ammar

Chandra Bhagavatula, Sergey Feldman, Russell Power, Waleed Ammar. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1022 article EN cc-by 2018-01-01

A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications

OPENALEX - Publications

Dongyeop Kang Waleed Ammar Bhavana Dalvi Madeleine van Zuylen Sebastian Kohlmeier and 2 more

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1149 article EN cc-by 2018-01-01

Extracting Scientific Figures with Distantly Supervised Neural Networks

OPENALEX - Publications

Noah Siegel Nicholas Lourie Russell Power Waleed Ammar

Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded development data-driven methods for figure extraction. In this paper, we induce high-quality training labels task extraction a number with no human intervention. To accomplish leverage auxiliary data provided two web collections documents (arXiv PubMed) to locate figures their associated captions rasterized PDF. We share resulting...

10.1145/3197026.3197040 preprint EN 2018-05-23

Automatic scoring of online discussion posts

OPENALEX - Publications

Nayer Wanas Motaz El-Saban Heba Ashour Waleed Ammar

Online discussions forums, known as forums for short, are conversational social cyberspaces constituting rich repositories of content and an important source collaborative knowledge. However, most this knowledge is buried inside the forum infrastructure its extraction both complex difficult. The ability to automatically rate postings in online discussion based on value their contribution, enhances users find within content. Several key have utilized intelligence made by users. a large...

10.1145/1458527.1458534 article EN 2008-10-30

Unsupervised POS Induction with Word Embeddings

OPENALEX - Publications

Chu‐Cheng Lin Waleed Ammar Chris Dyer Lori Levin

Chu-Cheng Lin, Waleed Ammar, Chris Dyer, Lori Levin. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1144 preprint EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction

OPENALEX - Publications

Waleed Ammar Matthew E. Peters Chandra Bhagavatula Russell Power

This paper describes our submission for the ScienceIE shared task (SemEval- 2017 Task 10) on entity and relation extraction from scientific papers. Our model is based end-to-end of Miwa Bansal (2016) with several enhancements such as semi-supervised learning via neural language models, character-level encoding, gazetteers extracted existing knowledge bases, ensembles. official ranked first in (scenario 1), second relation-only 3).

10.18653/v1/s17-2097 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Ontology alignment in the biomedical domain using entity definitions and context

OPENALEX - Publications

Lucy Lu Wang Chandra Bhagavatula Mark E Neumann Kyle Lo Chris Wilhelm and 1 more

Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations same entity, resulting in a need to de-duplicate when merging We propose method for enriching an ontology with external definition and context information, use this additional information alignment. develop neural architecture capable encoding available, show that addition data results F1-score 0.69 on Alignment Evaluation Initiative...

10.18653/v1/w18-2306 article EN cc-by 2018-01-01

Combining Distant and Direct Supervision for Neural Relation Extraction

OPENALEX - Publications

Iz Beltagy Kyle Lo Waleed Ammar

Iz Beltagy, Kyle Lo, Waleed Ammar. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1184 article EN 2019-01-01

Constraint-Based Models of Lexical Borrowing

OPENALEX - Publications

Yulia Tsvetkov Waleed Ammar Chris Dyer

Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a "donor" language to "recipient" as result contacts between communities speaking different languages.Borrowed words are found in all languages, and-in contrast cognate relationships-borrowing relationships may exist across unrelated languages (for example, about 40% Swahili's vocabulary borrowed Arabic).In this paper, we develop model morpho-phonological...

10.3115/v1/n15-1062 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

OPENALEX - Publications

Pradeep Dasigi Waleed Ammar Chris Dyer Eduard Hovy

Type-level word embeddings use the same set of parameters to represent all instances a regardless its context, ignoring inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined WordNet and token particular context by estimating distribution over relevant concepts. We new, context-sensitive model for predicting prepositional phrase (PP) attachments jointly learn concept parameters. show that using improves accuracy PP attachment 5.4% absolute points,...

10.18653/v1/p17-1191 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

The CMU Submission for the Shared Task on Language Identification in Code-Switched Data

OPENALEX - Publications

Chu‐Cheng Lin Waleed Ammar Lori Levin Chris Dyer

We describe the CMU submission for 2014 shared task on language identification in code-switched data. participated all four pairs: Spanish‐English, Mandarin‐English, Nepali‐English, and Modern Standard Arabic‐Arabic dialects. After describing our CRF-based baseline system, we discuss three extensions learning from unlabeled data: semi-supervised learning, word embeddings, lists.

10.3115/v1/w14-3909 article EN 2014-01-01

PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

OPENALEX - Publications

Rahul Goel Waleed Ammar Aditya Gupta Siddharth Vashishtha Motoki Sano and 11 more

Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Chuan He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Zhou Yu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.667 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Coming Soon ...