Waleed Ammar

ORCID: 0000-0003-3541-6981
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Biomedical Text Mining and Ontologies
  • Advanced Text Analysis Techniques
  • Semantic Web and Ontologies
  • Speech and dialogue systems
  • Text Readability and Simplification
  • Speech Recognition and Synthesis
  • Advanced Graph Neural Networks
  • Handwritten Text Recognition Techniques
  • Text and Document Classification Technologies
  • Genomics and Phylogenetic Studies
  • Protist diversity and phylogeny
  • RNA and protein synthesis mechanisms
  • Expert finding and Q&A systems
  • Multimodal Machine Learning Applications
  • Authorship Attribution and Profiling
  • Pharmacogenetics and Drug Metabolism
  • Advanced Image and Video Retrieval Techniques
  • Spam and Phishing Detection
  • Time Series Analysis and Forecasting
  • Neural Networks and Applications
  • Hate Speech and Cyberbullying Detection
  • Sentiment Analysis and Opinion Mining
  • Privacy, Security, and Data Protection

Google (United States)
2021-2023

University of California, Santa Barbara
2023

University of Rochester
2023

Allen Institute
2018-2020

Allen Institute for Artificial Intelligence
2017-2019

Northwestern University
2018

Carnegie Mellon University
2012-2017

Laboratoire d'Informatique de Paris-Nord
2017

Johns Hopkins University
2017

The University of Tokyo
2017

Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical is a critically important application area of which there are few robust, practical, publicly available models. This paper describes scispaCy, new tool practical biomedical/scientific heavily leverages the spaCy library. We detail performance two packages released scispaCy demonstrate their robustness on several...

10.18653/v1/w19-5034 preprint EN 2019-01-01

Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent that operates on word-level representations to produce context sensitive is trained relatively little labeled data. In this paper, we demonstrate general semi-supervised approach adding pretrained bidirectional language models systems and apply it sequence labeling We evaluate our model two datasets named entity...

10.18653/v1/p17-1161 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of structure. In the static strategy that is used in toolkits like Theano, CNTK, and TensorFlow, user first defines computation graph (a symbolic representation computation), then examples are fed into an engine executes this computes its derivatives. DyNet's strategy, construction mostly transparent, being implicitly constructed by executing procedural code outputs, free to use different...

10.48550/arxiv.1701.03980 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry...

10.18653/v1/n18-3011 article EN cc-by 2018-01-01

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages a single shared embedding space. Our estimation methods, multiCluster multiCCA, use dictionaries monolingual data; they do not require parallel data. evaluation method, multiQVEC-CCA, is shown to correlate better previous ones with two downstream tasks (text categorization parsing). also describe web portal that will facilitate further research this area, along open-source releases all our methods.

10.48550/arxiv.1602.01925 preprint EN other-oa arXiv (Cornell University) 2016-01-01

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The uses (i) word clusters embeddings; (ii) token-level language information; (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only effectively multiple languages, but also generalize across languages based on linguistic universals typological similarities, making more effective learn from limited annotations. Our parser’s...

10.1162/tacl_a_00109 article EN cc-by Transactions of the Association for Computational Linguistics 2016-12-01

Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1361 article EN 2019-01-01

Chandra Bhagavatula, Sergey Feldman, Russell Power, Waleed Ammar. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1022 article EN cc-by 2018-01-01

Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

10.18653/v1/n18-1149 article EN cc-by 2018-01-01

Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded development data-driven methods for figure extraction. In this paper, we induce high-quality training labels task extraction a number with no human intervention. To accomplish leverage auxiliary data provided two web collections documents (arXiv PubMed) to locate figures their associated captions rasterized PDF. We share resulting...

10.1145/3197026.3197040 preprint EN 2018-05-23

Online discussions forums, known as forums for short, are conversational social cyberspaces constituting rich repositories of content and an important source collaborative knowledge. However, most this knowledge is buried inside the forum infrastructure its extraction both complex difficult. The ability to automatically rate postings in online discussion based on value their contribution, enhances users find within content. Several key have utilized intelligence made by users. a large...

10.1145/1458527.1458534 article EN 2008-10-30

Chu-Cheng Lin, Waleed Ammar, Chris Dyer, Lori Levin. Proceedings of the 2015 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2015.

10.3115/v1/n15-1144 preprint EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

This paper describes our submission for the ScienceIE shared task (SemEval- 2017 Task 10) on entity and relation extraction from scientific papers. Our model is based end-to-end of Miwa Bansal (2016) with several enhancements such as semi-supervised learning via neural language models, character-level encoding, gazetteers extracted existing knowledge bases, ensembles. official ranked first in (scenario 1), second relation-only 3).

10.18653/v1/s17-2097 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2017-01-01

Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations same entity, resulting in a need to de-duplicate when merging We propose method for enriching an ontology with external definition and context information, use this additional information alignment. develop neural architecture capable encoding available, show that addition data results F1-score 0.69 on Alignment Evaluation Initiative...

10.18653/v1/w18-2306 article EN cc-by 2018-01-01

Iz Beltagy, Kyle Lo, Waleed Ammar. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1184 article EN 2019-01-01

Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a "donor" language to "recipient" as result contacts between communities speaking different languages.Borrowed words are found in all languages, and-in contrast cognate relationships-borrowing relationships may exist across unrelated languages (for example, about 40% Swahili's vocabulary borrowed Arabic).In this paper, we develop model morpho-phonological...

10.3115/v1/n15-1062 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

Type-level word embeddings use the same set of parameters to represent all instances a regardless its context, ignoring inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined WordNet and token particular context by estimating distribution over relevant concepts. We new, context-sensitive model for predicting prepositional phrase (PP) attachments jointly learn concept parameters. show that using improves accuracy PP attachment 5.4% absolute points,...

10.18653/v1/p17-1191 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

We describe the CMU submission for 2014 shared task on language identification in code-switched data. participated all four pairs: Spanish‐English, Mandarin‐English, Nepali‐English, and Modern Standard Arabic‐Arabic dialects. After describing our CRF-based baseline system, we discuss three extensions learning from unlabeled data: semi-supervised learning, word embeddings, lists.

10.3115/v1/w14-3909 article EN 2014-01-01

Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Chuan He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Zhou Yu. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.667 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01
Coming Soon ...