- Wikis in Education and Collaboration
- Natural Language Processing Techniques
- Topic Modeling
- Cancer-related gene regulation
- Open Source Software Innovations
- Human Mobility and Location-Based Analysis
- Hate Speech and Cyberbullying Detection
- Digital Games and Media
- Recommender Systems and Techniques
- Semantic Web and Ontologies
- Sentiment Analysis and Opinion Mining
- Complex Network Analysis Techniques
- Digital Marketing and Social Media
- Geographic Information Systems Studies
- Social Media and Politics
- Pneumonia and Respiratory Infections
- Web and Library Services
- Knowledge Management and Sharing
- Image Processing and 3D Reconstruction
- Mental Health via Writing
- Digital Communication and Language
- Tuberculosis Research and Epidemiology
- Urban Transport and Accessibility
- Data-Driven Disease Surveillance
- FinTech, Crowdfunding, Digital Finance
Texas MicroPower (United States)
2025
Wikimedia Foundation
2019-2024
Stellenbosch University
2023
University of Minnesota System
2015-2021
Northwestern University
2017-2018
Twin Cities Orthopedics
2016
University of Minnesota
2016
Reliant Medical Group
2016
University of Pittsburgh
2012
Providence College
2012
Large language models (LLMs) have been shown to be able perform new tasks based on a few demonstrations or natural instructions. While these capabilities led widespread adoption, most LLMs are developed by resource-rich organizations and frequently kept from the public. As step towards democratizing this powerful technology, we present BLOOM, 176B-parameter open-access model designed built thanks collaboration of hundreds researchers. BLOOM is decoder-only Transformer that was trained ROOTS...
Computational approaches to text analysis are useful in understanding aspects of online interaction, such as opinions and subjectivity text. Yet, recent studies have identified various forms bias language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we contribute a systematic examination application language models study discourse aging. We analyze treatment...
Emoji are commonly used in modern text communication. However, as graphics with nuanced details, emoji may be open to interpretation. also render differently on different viewing platforms (e.g., Apple’s iPhone vs. Google’s Nexus phone), potentially leading communication errors. We explore whether renderings or differences across give rise diverse interpretations of emoji. Through an online survey, we solicit people’s a sample the most popular characters, each rendered for multiple...
While Wikipedia is a subject of great interest in the computing literature, very little work has considered Wikipedia’s important relationships with other information technologies like search engines. In this paper, we report results two deception studies whose goal was to better understand critical relationship between and Google. These silently removed content from Google examined effect doing so on participants’ interactions both websites. Our findings demonstrate characterize an...
Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources geographic knowledge for humans intelligent technologies. In this paper, we explore the effectiveness peer production model across rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find both OpenStreetMap, rural areas is systematically lower quality, less likely produced by contributors who focus on local...
Geotagged tweets and other forms of social media volunteered geographic information (VGI) are becoming increasingly critical to many applications scientific studies. An important assumption underlying much this research is that VGI "local", or its geotags correspond closely with the general home locations contributors. We demonstrate through a study on three separate communities (Twitter, Flickr, Swarm) localness holds in only about 75% cases. In addition, we show contours follow...
Much research has shown that social media platforms have substantial population biases. However, very little is known about how these biases affect the many algorithms rely on data. Focusing case study of geolocation inference and their performance across urban-rural spectrum, we establish exhibit significantly worse for underrepresented populations (i.e. rural users). We further this finding robust both text- network-based algorithm designs. also show some bias can be attributed to design...
The recent emergence and adoption of Machine Learning technology, specifically Large Language Models, has drawn attention to the need for systematic transparent management language data. This work proposes an approach global data governance that attempts organize amongst stakeholders, values, rights. Our proposal is informed by prior on distributed accounts human values grounded international research collaboration brings together researchers practitioners from 60 countries. framework we...
The extensive Wikipedia literature has largely considered in isolation, outside of the context its broader Internet ecosystem. Very recent research demonstrated significance this limitation, identifying critical relationships between Google and that are highly relevant to many areas Wikipedia-based practice. This paper extends beyond search engines examine Wikipedia's with large-scale online communities, Stack Overflow Reddit particular. We find evidence consequential, albeit unidirectional...
Recent studies have identified various forms of bias in language-based models, raising concerns about the risk propagating social biases against certain groups based on sociodemographic factors (e.g., gender, race, geography). In this study, we analyze treatment age-related terms across 15 sentiment analysis models and 10 widely-used GloVe word embeddings attempt to alleviate through a method processing model training data. Our results show significant age is encoded outputs many algorithms...
Abstract Subsea, oil and gas production is an exciting area of growth for companies operating in deepwater across the world. As energy demands increase, improving subsea could play a crucial role supplying world's energy. This paper aims to show that Drag Reducing Agents (DRA) help improve through analyzing historical DRA performance multiphase applications provide recommendations which systems benefit from use DRA. While injection has not been implemented, this summarizes development,...
The Mayo clinic participated in the Depression Improvement Across Minnesota, Offering a New Direction model at two Family Clinics, that is, Rochester Northwest and Northeast sites. Although clinics demonstrated best 6-month remission rates state during first year of implementation, they were retrospectively found to differ on several process issues measures related populations served. Six-month significantly better clinic; yet, had more patient contacts. Differences activation into care...
Millions of people use platforms such as Google Maps to search for routes their desired destinations. Recently, researchers and mapping have shown growing interest in optimizing criteria other than travel time, e.g. simplicity, safety, beauty. However, despite the ubiquity algorithmic routing its potential define how millions move around world, very little is known about externalities that arise when adopting these new optimization criteria, redistribution traffic certain neighborhoods...
Abstract Background To improve tuberculosis (TB) diagnosis, the World Health Organisation (WHO) has called for a non-sputum based triage test to focus TB testing on people with high likelihood of having active pulmonary (TB). Various host or pathogen biomarker-based devices are in design stage and require validity assessment. Host biomarkers have shown promise accurately rule out TB, but further research is required determine generalisability. The TriageTB diagnostic study aims assess...
Abstract Background Non–sputum-based, point-of-care triage tests for pulmonary tuberculosis could enhance diagnostic programs. We assessed the accuracy of 2 finger-stick blood tests: Cepheid 3 gene host-response cartridge (Xpert-HR), which measures host messenger RNA transcripts, and 3-host protein multibiomarker test (MBT). Methods performed a prospective study consecutive participants with symptoms compatible in The Gambia, South Africa, Uganda, Vietnam. A composite reference standard...
Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit extent which Google leverages other user-generated respond queries. Analyzing results for six types important queries (e.g. popular, trending, expensive advertising), observe appears over...
Many applications of geotagged content are predicated on the concept localness (e.g., local restaurant recommendation, mining social media for perspectives an issue). However, definitions who is a "local" in given area typically informal and ad-hoc and, as result, approaches assessment that have been used past not formally validated. In this paper, we begin process addressing these gaps literature. Specifically, (1) formalize using themes identified 30-paper literature review, (2) develop...
Wikipedia is the largest web repository of free knowledge. Volunteer editors devote time and effort to creating expanding articles in more than 300 language editions. As content quality varies from article article, also spend substantial rating with specific criteria. However, keeping these assessments complete up-to-date largely impossible given ever-changing nature Wikipedia. To overcome this limitation, we propose a novel computational framework for modeling articles. State-of-the-art...
A major challenge for many analyses of Wikipedia dynamics—e.g., imbalances in content quality, geographic differences what is popular, types articles attract more editor discussion—is grouping the very diverse range into coherent, consistent topics. This problem has been addressed using various approaches based on Wikipedia's category network, WikiProjects, and external taxonomies. However, these have always limited their coverage: typically, only a small subset can be classified, or method...
Although Couchsurfing and Airbnb are both online communities that help users host strangers in their homes, they differ an important sense: prohibits monetary payment while is built around it.We conducted interviews with experienced on ("dual-users") to better understand systemic differences between the platforms. Based these we propose that, compared Couchsurfing, Airbnb: (1) appears require higher quality services, (2) places more emphasis over people, (3) shifts social power from hosts...