- Topic Modeling
- Natural Language Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Hate Speech and Cyberbullying Detection
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Authorship Attribution and Profiling
- Advanced Graph Neural Networks
- Domain Adaptation and Few-Shot Learning
- Cybercrime and Law Enforcement Studies
- Complex Network Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Spam and Phishing Detection
- Misinformation and Its Impacts
- Recommender Systems and Techniques
- Consumer Market Behavior and Pricing
- Time Series Analysis and Forecasting
- Graph Theory and Algorithms
- Image Retrieval and Classification Techniques
- Bioinformatics and Genomic Networks
- Data Visualization and Analytics
- Biomedical Text Mining and Ontologies
- Text Readability and Simplification
- Speech and dialogue systems
- Data Quality and Management
Dolby (United States)
2024
The Ohio State University
2019-2023
Adobe Systems (United States)
2017-2018
Indian Institute of Technology Roorkee
2017
We introduce LEAF-QA, a comprehensive dataset of 250 <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">,</sub> 000 densely annotated figures/charts, constructed from real-world open data sources, along with 2 million question-answer (QA) pairs querying the structure and semantics these charts. LEAF-QA highlights problem multimodal QA, which is notably different conventional visual QA (VQA), has recently gained interest in community. Furthermore,...
Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known However, as number URLs and pages continued increase at rapid pace, started one or more machine learning classifiers part their services that aim better protect end users from harm. While additional information could be used, typically evaluate every using classifier order quickly these Early detection used standard classifiers, but...
Accurately discovering user intents from their written or spoken language plays a critical role in natural understanding and automated dialog response. Most existing research models this as classification task with single intent label per utterance, grouping utterances into type set of categories known beforehand. Going beyond formulation, we define investigate new problem open discovery. It involves one more generic types text utterances, that may not have been encountered during training....
A sizable proportion of deployed machine learning models make their decisions in a black-box manner. Such decision-making procedures are susceptible to intrinsic biases, which has led call for accountability decision systems. In this work, we investigate mechanisms that help audit claimed mathematical guarantees the fairness such We construct AVOIR, system reduces number observations required runtime monitoring probabilistic assertions over metrics specified on functions associated with AI...
Detecting and identifying user intent from text, both written spoken, plays an important role in modelling understand dialogs. Existing research for discovery model it as a classification task with predefined set of known categories. To generailze beyond these preexisting classes, we define new \textit{open discovery}. We investigate how can be generalized to those not seen during training. this end, propose two-stage approach - predicting whether utterance contains intent, then tagging the...
Dynamically extracting and representing continually evolving knowledge entities is an essential scaffold for grounded intelligence decision making. Creating schemas newly emerging, unfamiliar, domain-specific ideas or events poses the following challenges: (i) detecting relevant, often previously unknown concepts associated with new domain; (ii) learning ontological, semantically accurate relationships among concepts, despite having severely limited annotated data. To this end, we propose a...
Darknet market forums are frequently used to exchange illegal goods and services between parties who use encryption conceal their identities. The Tor network is host these markets, which guarantees additional anonymization from IP location tracking, making it challenging link across malicious users using multiple accounts (sybils). Additionally, migrate new when one closed further increasing the difficulty of linking forums. We develop a novel stylometry-based multitask learning approach for...
Augmented Reality (AR) based applications have existed for some time; however, their true potential in digital marketing remains unexploited. To bridge this gap we create a novel consumer targeting system. First, analyze interactions on AR-based retail apps to identify her preferred purchase viewpoint during the session. We then target through personalized catalog, created by embedding recommended products visual. The color and style of embedded product are matched with recommendations, text...
The increasing use of ad blocking software poses a major threat for publishers in loss online revenue, and advertisers the audience. Major have adopted various anti-ad strategies such as denial access to website content asking users subscribe paid ad-free versions. However, are unsure about true impact these [2, 3]. We posit that real problem lies measurement effectiveness because existing methods compare metrics after implementation with just before implementation, making them error prone...
Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph quantification built upon this approach conformal prediction. The nascent nature of these explorations led to conflicting choices implementations, baselines, and method evaluation. In work, we analyze design made literature discuss tradeoffs existing methods. Building on implementations methods, introduce techniques scale methods large-scale...
Conformal Prediction is a robust framework that ensures reliable coverage across machine learning tasks. Although recent studies have applied conformal prediction to graph neural networks, they largely emphasized post-hoc set generation. Improving during the training stage remains unaddressed. In this work, we tackle challenge from denoising perspective by introducing SparGCP, which incorporates sparsification and prediction-specific objective into GNN training. SparGCP employs parameterized...
This paper describes the OSU submission to SIGMORPHON 2019 shared task, Crosslinguality and Context in Morphology. Our system addresses contextual morphological analysis subtask of Task 2, which is produce morphosyntactic description (MSD) each fully inflected word within a given sentence. We frame this as sequence generation task employ neural encoder-decoder (seq2seq) architecture generate MSD tags encoded representation token. Follow-up analyses reveal that our most significantly improves...
Traditional Visual Question Answering (VQA) datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized need improve balance such reduce system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist quantify explain these failures. As an initial step towards better...
Graphs are a natural abstraction for many problems where nodes represent entities and edges relationship across entities. An important area of research that has emerged over the last decade is use graphs as vehicle non-linear dimensionality reduction in manner akin to previous efforts based on manifold learning with uses downstream database processing, machine visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation methods...
Orchestration of campaigns for online display advertising requires marketers to forecast audience size at the granularity specific attributes web traffic, characterized by categorical nature all (e.g. {US, Chrome, Mobile}). With each attribute taking many values, very large combination set makes estimating any challenging. We modify Eclat, a frequent itemset mining (FIM) algorithm, accommodate variables. For consequent and infrequent itemsets, we then provide forecasts using time series...
Natural disasters such as floods, forest fires, and hurricanes can cause catastrophic damage to human life infrastructure. We focus on response caused by both river water flooding storm surge. Using models for surge simulation flood extent prediction, we generate forecasts about areas likely be highly affected the disaster. Further, overlay results with information traffic incidents correlate other data modality. present these in a modularized, interactive map-based visualization, which help...