Isaac Johnson

ORCID: 0000-0002-8869-3010
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Wikis in Education and Collaboration
  • Natural Language Processing Techniques
  • Topic Modeling
  • Cancer-related gene regulation
  • Open Source Software Innovations
  • Human Mobility and Location-Based Analysis
  • Hate Speech and Cyberbullying Detection
  • Digital Games and Media
  • Recommender Systems and Techniques
  • Semantic Web and Ontologies
  • Sentiment Analysis and Opinion Mining
  • Complex Network Analysis Techniques
  • Digital Marketing and Social Media
  • Geographic Information Systems Studies
  • Social Media and Politics
  • Pneumonia and Respiratory Infections
  • Web and Library Services
  • Knowledge Management and Sharing
  • Image Processing and 3D Reconstruction
  • Mental Health via Writing
  • Digital Communication and Language
  • Tuberculosis Research and Epidemiology
  • Urban Transport and Accessibility
  • Data-Driven Disease Surveillance
  • FinTech, Crowdfunding, Digital Finance

Texas MicroPower (United States)
2025

Wikimedia Foundation
2019-2024

Stellenbosch University
2023

University of Minnesota System
2015-2021

Northwestern University
2017-2018

Twin Cities Orthopedics
2016

University of Minnesota
2016

Reliant Medical Group
2016

University of Pittsburgh
2012

Providence College
2012

Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ilić and 95 more Daniel Hesslow Roman Castagné Alexandra Sasha Luccioni François Yvon Matthias Gallé Jonathan Tow Alexander M. Rush Stella Biderman Albert Webson Pawan Sasanka Ammanamanchi Thomas J. Wang Benoît Sagot Niklas Muennighoff A. Villanova del Moral Olatunji Ruwase Rachel Bawden Stas Bekman Angelina McMillan-Major Iz Beltagy Huu Du Nguyen Lucile Saulnier Samson Tan Pedro Ortiz Suarez Victor Sanh Hugo Laurençon Yacine Jernite Julien Launay Margaret Mitchell Colin Raffel Aaron Gokaslan Adi Simhi Aitor Soroa Alham Fikri Aji Amit Alfassy Anna Rogers Ariel Kreisberg Nitzav Canwen Xu Chenghao Mou Chris Chinenye Emezue Christopher Klamm Colin Leong Daniel van Strien David Ifeoluwa Adelani Dragomir Radev Eduardo González Ponferrada Efrat Levkovizh Ethan Kim Eyal Bar Natan Francesco De Toni Gérard Dupont Germán Kruszewski Giada Pistilli Hady Elsahar Hamza Benyamina Hieu Tran Ian Yu Idris Abdulmumin Isaac Johnson Itziar González-Dios Javier de la Rosa Jenny Chim Jesse Dodge Jianguo Zhu Jonathan Chang Jörg Frohberg Joseph Tobing Joydeep Bhattacharjee Khalid Almubarak Kimbo Chen Kyle Lo Leandro von Werra Leon Weber Long Phan Loubna Ben Allal Ludovic Tanguy Manan Dey Manuel Romero Muñoz Maraim Masoud María Grandury Mario Šaško Max Tze Han Huang Maximin Coavoux Mayank Singh Mike Tian-Jian Jiang Minh Chien Vu Mohammad Ali Jauhar Mustafa Ghaleb Nishant Subramani Nora Kassner Nurulaqilla Khamis Olivier Nguyen Omar Espejel Ona De Gibert Paulo Villegas Peter Henderson

Large language models (LLMs) have been shown to be able perform new tasks based on a few demonstrations or natural instructions. While these capabilities led widespread adoption, most LLMs are developed by resource-rich organizations and frequently kept from the public. As step towards democratizing this powerful technology, we present BLOOM, 176B-parameter open-access model designed built thanks collaboration of hundreds researchers. BLOOM is decoder-only Transformer that was trained ROOTS...

10.48550/arxiv.2211.05100 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Computational approaches to text analysis are useful in understanding aspects of online interaction, such as opinions and subjectivity text. Yet, recent studies have identified various forms bias language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we contribute a systematic examination application language models study discourse aging. We analyze treatment...

10.1145/3173574.3173986 article EN 2018-04-20

Emoji are commonly used in modern text communication. However, as graphics with nuanced details, emoji may be open to interpretation. also render differently on different viewing platforms (e.g., Apple’s iPhone vs. Google’s Nexus phone), potentially leading communication errors. We explore whether renderings or differences across give rise diverse interpretations of emoji. Through an online survey, we solicit people’s a sample the most popular characters, each rendered for multiple...

10.1609/icwsm.v10i1.14757 article EN Proceedings of the International AAAI Conference on Web and Social Media 2021-08-04

While Wikipedia is a subject of great interest in the computing literature, very little work has considered Wikipedia’s important relationships with other information technologies like search engines. In this paper, we report results two deception studies whose goal was to better understand critical relationship between and Google. These silently removed content from Google examined effect doing so on participants’ interactions both websites. Our findings demonstrate characterize an...

10.1609/icwsm.v11i1.14883 article EN Proceedings of the International AAAI Conference on Web and Social Media 2017-05-03

Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources geographic knowledge for humans intelligent technologies. In this paper, we explore the effectiveness peer production model across rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find both OpenStreetMap, rural areas is systematically lower quality, less likely produced by contributors who focus on local...

10.1145/2858036.2858123 preprint EN 2016-05-05

Geotagged tweets and other forms of social media volunteered geographic information (VGI) are becoming increasingly critical to many applications scientific studies. An important assumption underlying much this research is that VGI "local", or its geotags correspond closely with the general home locations contributors. We demonstrate through a study on three separate communities (Twitter, Flickr, Swarm) localness holds in only about 75% cases. In addition, we show contours follow...

10.1145/2858036.2858122 article EN 2016-05-05

Much research has shown that social media platforms have substantial population biases. However, very little is known about how these biases affect the many algorithms rely on data. Focusing case study of geolocation inference and their performance across urban-rural spectrum, we establish exhibit significantly worse for underrepresented populations (i.e. rural users). We further this finding robust both text- network-based algorithm designs. also show some bias can be attributed to design...

10.1145/3025453.3026015 article EN 2017-05-02

The recent emergence and adoption of Machine Learning technology, specifically Large Language Models, has drawn attention to the need for systematic transparent management language data. This work proposes an approach global data governance that attempts organize amongst stakeholders, values, rights. Our proposal is informed by prior on distributed accounts human values grounded international research collaboration brings together researchers practitioners from 60 countries. framework we...

10.1145/3531146.3534637 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

The extensive Wikipedia literature has largely considered in isolation, outside of the context its broader Internet ecosystem. Very recent research demonstrated significance this limitation, identifying critical relationships between Google and that are highly relevant to many areas Wikipedia-based practice. This paper extends beyond search engines examine Wikipedia's with large-scale online communities, Stack Overflow Reddit particular. We find evidence consequential, albeit unidirectional...

10.1145/3173574.3174140 article EN 2018-04-20

Recent studies have identified various forms of bias in language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we analyze treatment age-related terms across 15 sentiment analysis models and 10 widely-used GloVe word embeddings attempt to alleviate through a method processing model training data. Our results show significant age is encoded outputs many algorithms...

10.24963/ijcai.2019/852 article EN 2019-07-28

Abstract Subsea, oil and gas production is an exciting area of growth for companies operating in deepwater across the world. As energy demands increase, improving subsea could play a crucial role supplying world's energy. This paper aims to show that Drag Reducing Agents (DRA) help improve through analyzing historical DRA performance multiphase applications provide recommendations which systems benefit from use DRA. While injection has not been implemented, this summarizes development,...

10.4043/35767-ms article EN Offshore Technology Conference 2025-04-28

The Mayo clinic participated in the Depression Improvement Across Minnesota, Offering a New Direction model at two Family Clinics, that is, Rochester Northwest and Northeast sites. Although clinics demonstrated best 6-month remission rates state during first year of implementation, they were retrospectively found to differ on several process issues measures related populations served. Six-month significantly better clinic; yet, had more patient contacts. Differences activation into care...

10.1097/jac.0b013e31820f63cb article EN Journal of Ambulatory Care Management 2011-04-01

Millions of people use platforms such as Google Maps to search for routes their desired destinations. Recently, researchers and mapping have shown growing interest in optimizing criteria other than travel time, e.g. simplicity, safety, beauty. However, despite the ubiquity algorithmic routing its potential define how millions move around world, very little is known about externalities that arise when adopting these new optimization criteria, redistribution traffic certain neighborhoods...

10.1145/3090080 article EN Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 2017-06-30

Abstract Background To improve tuberculosis (TB) diagnosis, the World Health Organisation (WHO) has called for a non-sputum based triage test to focus TB testing on people with high likelihood of having active pulmonary (TB). Various host or pathogen biomarker-based devices are in design stage and require validity assessment. Host biomarkers have shown promise accurately rule out TB, but further research is required determine generalisability. The TriageTB diagnostic study aims assess...

10.1186/s12879-023-08342-5 article EN cc-by BMC Infectious Diseases 2023-07-03
Jayne S. Sutherland Gian van der Spuy Jane Shaw Tracy Richardson Elisa M. Tjon Kon Fat and 95 more Awa Gindeh Olumuyiwa Owolabi Nguyễn Thụy Thương Thương Le Hong Van Van Hoang Nguyen Dang Thao Harriet Mayanja-Kizza Mary Nsereko AnnRitah Namuganga Sophie Nalukwago John T. Belisle Emmanuel Moreau Adam Penn‐Nicholson Guy Thwaites Jill Winter Hazel M. Dockrell Thomas J. Scriba Kim Stanley Bronwyn Smith Novel N. Chegou Stephanus T. Malherbe Annemieke Geluk Paul L. A. M. Corstjens Gerhard Walzl Jayne S. Sutherland Olumuyiwa Owolabi Amie Secka Bintou Njai Abdou K. Sillah Georgetta K. Daffeh Awa Gindeh Amadou Barry Momodou Rashid Joseph Mendy Binta Sarr Abi-Janet Riley Alhaji Jobe Monica Davies Kairaba Kanyi Momodou W. Jallow Salieu Barry Ousainou Cham Gerhard Walzl Stephanus T. Malherbe Bronwyn Smith Gian van der Spuy Kim Stanley Jane Shaw Alicia Chetram Tracy Richardson Bernadine Fransman Isaac Johnson Marika Finn Andriëtte Hiemstra Novel N. Chegou Helena Kuivaniemi Gerard Tromp Susanne Tönsing Elizma Smit Balie Carstens Harriet Mayanja‐Kizza Mary Nsereko AnnRitah Namuganga Sophie Nalukwago Joseph Akol Saidah Menya Veronica Kizza Yusuf Kironde Deborah Banturaki Immaculate Nahereza Simon Okiror Immaculate Kemigisha Paul Mutumba Henry Ojiambo Lilian Murungi Joan Nassuna Gladys Mpalanyi Michael Odie Guy Thwaites Nguyễn Thụy Thương Thương Van Le Son Vo Thanh Hau Nguyen Thi Ha Vu Thi Ngoc Ngoc Le Hong John T. Belisle Karen M. Dobos Hazel M. Dockrell Thomas J. Scriba Mark Hatherill Kate Hadley Justin Shenje Stanley Kimbung Humphrey Mulenga Rachel Oelofse

Abstract Background Non–sputum-based, point-of-care triage tests for pulmonary tuberculosis could enhance diagnostic programs. We assessed the accuracy of 2 finger-stick blood tests: Cepheid 3 gene host-response cartridge (Xpert-HR), which measures host messenger RNA transcripts, and 3-host protein multibiomarker test (MBT). Methods performed a prospective study consecutive participants with symptoms compatible in The Gambia, South Africa, Uganda, Vietnam. A composite reference standard...

10.1093/cid/ciaf105 article EN Clinical Infectious Diseases 2025-04-16

Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit extent which Google leverages other user-generated respond queries. Analyzing results for six types important queries (e.g. popular, trending, expensive advertising), observe appears over...

10.1609/icwsm.v13i01.3248 article EN Proceedings of the International AAAI Conference on Web and Social Media 2019-07-06

Many applications of geotagged content are predicated on the concept localness (e.g., local restaurant recommendation, mining social media for perspectives an issue). However, definitions who is a "local" in given area typically informal and ad-hoc and, as result, approaches assessment that have been used past not formally validated. In this paper, we begin process addressing these gaps literature. Specifically, (1) formalize using themes identified 30-paper literature review, (2) develop...

10.1145/3173574.3173839 article EN 2018-04-20

Wikipedia is the largest web repository of free knowledge. Volunteer editors devote time and effort to creating expanding articles in more than 300 language editions. As content quality varies from article article, also spend substantial rating with specific criteria. However, keeping these assessments complete up-to-date largely impossible given ever-changing nature Wikipedia. To overcome this limitation, we propose a novel computational framework for modeling articles. State-of-the-art...

10.1609/icwsm.v18i1.31436 article EN Proceedings of the International AAAI Conference on Web and Social Media 2024-05-28

A major challenge for many analyses of Wikipedia dynamics—e.g., imbalances in content quality, geographic differences what is popular, types articles attract more editor discussion—is grouping the very diverse range into coherent, consistent topics. This problem has been addressed using various approaches based on Wikipedia's category network, WikiProjects, and external taxonomies. However, these have always limited their coverage: typically, only a small subset can be classified, or method...

10.1145/3442442.3452347 article EN Companion Proceedings of the The Web Conference 2018 2021-04-19

Although Couchsurfing and Airbnb are both online communities that help users host strangers in their homes, they differ an important sense: prohibits monetary payment while is built around it.We conducted interviews with experienced on ("dual-users") to better understand systemic differences between the platforms. Based these we propose that, compared Couchsurfing, Airbnb: (1) appears require higher quality services, (2) places more emphasis over people, (3) shifts social power from hosts...

10.1145/3134693 article EN Proceedings of the ACM on Human-Computer Interaction 2017-12-06
Coming Soon ...