Ruoxi Jia

ORCID: 0000-0001-9662-9556
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Adversarial Robustness in Machine Learning
  • Privacy-Preserving Technologies in Data
  • Anomaly Detection Techniques and Applications
  • Machine Learning and Data Classification
  • Natural Language Processing Techniques
  • Cryptography and Data Security
  • Data Quality and Management
  • Advanced Neural Network Applications
  • Topic Modeling
  • Building Energy and Comfort Optimization
  • Machine Learning and Algorithms
  • Domain Adaptation and Few-Shot Learning
  • Indoor and Outdoor Localization Technologies
  • Stochastic Gradient Optimization Techniques
  • Advanced Malware Detection Techniques
  • Ethics and Social Impacts of AI
  • Explainable Artificial Intelligence (XAI)
  • Auction Theory and Applications
  • Blockchain Technology Applications and Security
  • Bayesian Modeling and Causal Inference
  • Digital Media Forensic Detection
  • Mobile Crowdsensing and Crowdsourcing
  • Data Mining Algorithms and Applications
  • Evacuation and Crowd Dynamics
  • Advanced Image and Video Retrieval Techniques

Virginia Tech
2020-2024

Google (United States)
2022

Columbia University
2022

University of California, Berkeley
2014-2021

Tsinghua University
2021

Harvard University Press
2021

Huazhong University of Science and Technology
2021

Southern California University for Professional Studies
2020

University of Southern California
2020

Berkeley College
2019

This paper studies model-inversion attacks, in which the access to a model is abused infer information about training data. Since its first introduction by~\cite{fredrikson2014privacy}, such attacks have raised serious concerns given that data usually contain privacy sensitive information. Thus far, successful only been demonstrated on simple models, as linear regression and logistic regression. Previous attempts invert neural networks, even ones with architectures, failed produce convincing...

10.1109/cvpr42600.2020.00033 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Backdoor attacks introduce manipulated data into a machine learning model's training set, causing the model to misclassify inputs with trigger during testing achieve desired outcome by attacker. For backdoor bypass human inspection, it is essential that injected appear be correctly labeled. The such property are often referred as "clean-label attacks." success of current clean-label methods largely depends on access complete set. Yet, accessing dataset challenging or unfeasible since...

10.1145/3576915.3616617 article EN cc-by Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2023-11-15

Given a data set D containing millions of points and consumer who is willing to pay for $ X train machine learning (ML) model over , how should we distribute this $X each point reflect its "value"? In paper, define the "relative value data" via Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such fairness, rationality decentralizability. For general, bounded utility functions, known be challenging compute: get values all N points, requires O (2 )...

10.14778/3342263.3342637 article EN Proceedings of the VLDB Endowment 2019-07-01

Non-intrusive presence detection of individuals in commercial buildings is much easier to implement than intrusive methods such as passive infrared, acoustic sensors, and camera. Individual power consumption, while providing useful feedback motivation for energy saving, can be used a valuable source detection. We conduct pilot experiments an office setting collect individual data by ultrasonic acceleration WiFi access points, addition the monitoring data. PresenceSense (PS), semi-supervised...

10.1109/tmc.2017.2684806 article EN IEEE Transactions on Mobile Computing 2017-03-20

Backdoor attacks have been considered a severe security threat to deep learning. Such can make models perform abnormally on inputs with predefined triggers and still retain state-of-the-art performance clean data. While backdoor thoroughly investigated in the image domain from both attackers' defenders' sides, an analysis frequency has missing thus far.This paper first revisits existing perspective performs comprehensive analysis. Our results show that many current exhibit high-frequency...

10.1109/iccv48922.2021.01616 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Smart buildings today are aimed at providing safe, healthy, comfortable, affordable, and beautiful spaces in a carbon energy-efficient way. They emerging as complex cyber-physical systems with humans the loop. Cost, need to cope increasing functional complexity, flexibility, fragmentation of supply chain, time-to-market pressure rendering traditional heuristic ad hoc design paradigms inefficient insufficient for future. In this paper, we present platform-based methodology smart building...

10.1109/jproc.2018.2856932 article EN Proceedings of the IEEE 2018-09-01

Building control is a challenging task, not least because of complex building dynamics ad multiple objectives that are often conflicting. To tackle this challenge, we explore an end-to-end deep reinforcement learning paradigm, which learns optimal strategy to reduce energy consumption and enhance occupant comfort from the data building-controller interactions. Because real-world policies need be interpretable efficient in learning, work makes following key contributions: (1) investigated...

10.1016/j.egypro.2019.01.494 article EN Energy Procedia 2019-02-01

Outlier detection and novelty are two important topics for anomaly detection. Suppose the majority of a dataset drawn from certain distribution, outlier both aim to detect data samples that do not fit distribution. Outliers refer within this dataset, while novelties new samples. In meantime, backdoor poisoning attacks machine learning models achieved through injecting into training which could be regarded as "outliers" intentionally added by attackers. Differential privacy has been proposed...

10.48550/arxiv.1911.07116 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. Such have triggered increasing concerns about privacy, especially given a growing number of online repositories. However, existing MI against deep neural networks (DNNs) large room for performance improvement. We present novel inversion-specific GAN that can better distill knowledge useful performing on private models public data. In particular, we train the discriminator to differentiate not only...

10.1109/iccv48922.2021.01587 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained potential copyright infringements. However, these vulnerable watermark removal attacks. In this work, we propose REFIT, unified framework based on fine-tuning, which does not rely knowledge watermarks, is effective against wide range schemes. particular, conduct comprehensive study realistic...

10.1145/3433210.3453079 preprint EN 2021-05-24

This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy aimed at inferring information about the training data distribution given access to target machine learning model. Existing rely on model-specific heuristics or noise injection. While being able mitigate attacks, existing methods significantly hinder performance. There remains question how design mechanism that is applicable variety models and achieves better utility-privacy tradeoff. In this...

10.1609/aaai.v35i13.17387 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Recent studies show that the state-of-the-art deep neural networks are vulnerable to model inversion attacks, in which access a is abused reconstruct private training data of any given target class. Existing attacks rely on having either complete (whitebox) or model's soft-labels (blackbox). However, no prior work has been done harder but more practical scenario, attacker only predicted label, without confidence measure. In this paper, we introduce an algorithm, Boundary-Repelling Model...

10.1109/cvpr52688.2022.01462 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release Llama and OpenAI's APIs fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are safety costs associated with such fine-tuning? We note that while existing alignment infrastructures can restrict harmful behaviors at inference time, they do not cover risks when privileges extended to end-users....

10.48550/arxiv.2310.03693 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Non-intrusive presence detection of individuals in commercial buildings is much easier to implement than intrusive methods such as passive infrared, acoustic sensors, and camera. Individual power consumption, while providing useful feedback motivation for energy saving, can be used a valuable source detection. We conduct pilot experiments an office setting collect individual data by ultrasonic acceleration WiFi access points, addition the monitoring data. PresenceSense (PS), semi-supervised...

10.1145/2674061.2674073 article EN 2014-10-31

Large-scale sensing and actuation infrastructures have allowed buildings to achieve significant energy savings; at the same time, these technologies introduce privacy risks that must be addressed. In this paper, we present a framework for modeling trade-off between improved control performance increased due occupancy sensing. More specifically, consider occupancy-based HVAC as objective location traces of individual occupants private variables. Previous studies shown information can inferred...

10.1145/3055004.3055007 article EN 2017-04-10

With the increasing applications of language models, it has become crucial to protect these models from leaking private information. Previous work attempted tackle this challenge by training RNN-based with differential privacy guarantees.However, applying classical leads poor model performance as underlying notion is over-pessimistic and provides undifferentiated protection for all tokens in data. Given that information natural sparse (for example, bulk an email might not carry personally...

10.18653/v1/2022.naacl-main.205 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape Artificial Intelligence (AI). These models are now foundational to a wide range applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, scientific discovery. However, widespread deployment also exposes them significant safety risks, raising concerns about...

10.48550/arxiv.2502.05206 preprint EN arXiv (Cornell University) 2025-02-02

Monitoring an individual electrical load's energy usage is of great significance in energy-efficient buildings as it underlies the sophisticated load control and optimization strategies. Non-intrusive monitoring (NILM) provides economical tool to access per-load power consumption without deploying fine-grained, large-scale smart meters. However, existing NILM approaches require training data be collected by sub-metering appliances well prior knowledge about number attached meter, which are...

10.1109/smartgridcomm.2015.7436411 article EN 2015-11-01

We present results from a set of experiments in this pilot study to investigate the causal influence user activity on various environmental parameters monitored by occupant-carried multi-purpose sensors. Hypotheses with respect each type measurements are verified, including temperature, humidity, and light level collected during eight typical activities: sitting lab / cubicle, indoor walking running, resting after physical activity, climbing stairs, taking elevators, outdoor walking. Our...

10.1109/iecon.2014.7049320 article EN IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society 2014-10-01
Coming Soon ...