- Blockchain Technology Applications and Security
- Semantic Web and Ontologies
- Biomedical Text Mining and Ontologies
- Scientific Computing and Data Management
- Explainable Artificial Intelligence (XAI)
- Topic Modeling
- IoT and Edge/Fog Computing
- FinTech, Crowdfunding, Digital Finance
- Data Quality and Management
- Machine Learning in Healthcare
- Privacy-Preserving Technologies in Data
- Research Data Management Practices
- Service-Oriented Architecture and Web Services
- Spam and Phishing Detection
- Big Data and Business Intelligence
- Artificial Intelligence in Healthcare and Education
- Cryptography and Data Security
- Digital Rights Management and Security
- Crime, Illicit Activities, and Governance
- Mental Health Research Topics
- Digital Mental Health Interventions
- Peer-to-Peer Network Technologies
- Recommender Systems and Techniques
- Web Data Mining and Analysis
- Imbalanced Data Classification Techniques
Rensselaer Polytechnic Institute
2012-2025
Massachusetts Institute of Technology
2010-2019
Oracle (United States)
2016
IIT@MIT
2009
We evaluate Kahneman-Tversky Optimization (KTO) as a fine-tuning method for large language models (LLMs) in federated learning (FL) settings, comparing it against Direct Preference (DPO). Using Alpaca-7B the base model, we fine-tune on realistic dataset under both methods and performance using MT-Bench-1, Vicuna, AdvBench benchmarks. Additionally, introduce redistributed setup, where only KTO is applicable due to its ability handle single-response feedback, unlike DPO's reliance paired...
Federated Learning (FL) enables collaborative model training without sharing raw data, preserving privacy while harnessing distributed datasets. However, traditional FL systems often rely on centralized aggregating mechanisms, introducing trust issues, single points of failure, and limited mechanisms for incentivizing meaningful client contributions. These challenges are exacerbated as scales to train resource-intensive models, such large language models (LLMs), requiring scalable,...
People can affect change in their eating patterns by substituting ingredients recipes. Such substitutions may be motivated specific goals, like modifying the intake of a nutrient or avoiding particular category ingredients. Determining how to modify recipe difficult because people need 1) identify which act as valid replacements for original and 2) figure out whether substitution is “good” context, consider factors such allergies, nutritional contents individual ingredients, other dietary...
Dataset distillation generates a small set of information-rich instances from large dataset, resulting in reduced storage requirements, privacy or copyright risks, and computational costs for downstream modeling, though much the research has focused on image data modality. We study tabular distillation, which brings novel challenges such as inherent feature heterogeneity common use non-differentiable learning models (such decision tree ensembles nearest-neighbor predictors). To mitigate...
This paper introduces an explanation framework designed to enhance the quality of rules in knowledge-based reasoning systems based on dataset-driven insights. The traditional method for rule induction from data typically requires labor-intensive labeling and data-driven learning. provides alternative instead allows refinement existing rules: it generates explanations inferences leverages human interpretation refine rules. It four complementary types: trace-based, contextual, contrastive,...
Smartphones are being used for a wide range of activities including messaging, social networking, calendar and contact management as well location context-aware applications. The ubiquity handheld computing technology has been found to be especially useful in disaster relief operations. Our focus is enable developers quickly deploy applications that take advantage key sources fundamental today's networked citizens, Twitter feeds, Facebook posts, current news releases, government data. These...
Many access control systems, particularly those utilized in hospital environments, exercise optimistic security, because preventing to information may have undesirable consequences. However, the wrong hands, these over-broad permissions result privacy violations. To circumvent this issue, we developed Privacy Enabling Transparent Systems (PETS) that makes transparency a key component systems architectures. PETS is built on open web standards and introduces Provenance Tracking Network (PTN),...
With the increased use of AI methods to provide recommendations in health, specifically dietary recommendation space, there is also an need for explainability those recommendations. Such explanations would benefit users systems by empowering them with justifications following system's suggestions. We present Food Explanation Ontology (FEO) that provides a formalism modeling food-related FEO models food recommendations, using concepts from explanation domain create responses user questions...
The rapid development of low-cost sensors, smart devices, communication networks, and learning algorithms has enabled data-driven decision making in large-scale systems. However, the platforms for such Internet Things (IoT) applications collecting data a cohesive, yet simple manner is not very well understood. MIT App Inventor [1] an open-source, user-friendly interface to develop mobile been used by over ten million users worldwide. We have added IoT capability platform where people can...
Cryptocurrency is a fast-moving space, with continuous influx of new projects every year. However, an increasing number incidents in the such as hacks and security breaches, threaten growth community development technology. This dynamic often tumultuous landscape vividly mirrored shaped by discussions within "Crypto Twitter," key digital arena where investors, enthusiasts, skeptics converge, revealing real-time sentiments trends through social media interactions. We present our analysis on...
Data distillation is a technique of reducing large dataset into smaller dataset. The can then be used to train model which perform comparably trained on the full Past works have examined this approach for image datasets, focusing neural networks as target models. However, tabular datasets pose new challenges not seen in images. A sample one dimensional vector unlike two (or three) pixel grid images, and Non-NN models such XGBoost often outperform network (NN) based Our contribution work...
In the past decade, trustworthy Artificial Intelligence (AI) has emerged as a focus for AI community to ensure better adoption of models, and explainable is cornerstone in this area. Over years, shifted from building transparent methods making recommendations on how make black-box or opaque machine learning models their results more understandable by experts non-expert users. our previous work, address goal supporting user-centered explanations that model explainable, we developed an...
Explainability has been an important goal since the early days of Artificial Intelligence. Several approaches for producing explanations have developed. However, many these were tightly coupled with capabilities artificial intelligence systems at time. With proliferation AI-enabled in sometimes critical settings, there is a need them to be explainable end-users and decision-makers. We present historical overview systems, focus on knowledge-enabled spanning expert cognitive assistants,...
Given the ubiquity of data on web, and lack usage restriction enforcement mechanisms, stories personal, creative other kinds misuses are rise. There should be both sociological technological mechanisms that facilitate accountability web would prevent such misuses. Sociological appeal to consumer's self-interest in adhering provider's desires. This involves a system rewards as recognition financial incentives, deterrents prohibitions by laws for any violations social pressure. Bur there is no...