- Privacy-Preserving Technologies in Data
- Cryptography and Data Security
- Stochastic Gradient Optimization Techniques
- Privacy, Security, and Data Protection
- Handwritten Text Recognition Techniques
- Adversarial Robustness in Machine Learning
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Mobile Crowdsensing and Crowdsourcing
- Anomaly Detection Techniques and Applications
- Landslides and related hazards
- Data-Driven Disease Surveillance
- Topic Modeling
- Neural Networks and Applications
- Seismology and Earthquake Studies
- Data Management and Algorithms
- Complexity and Algorithms in Graphs
- Internet Traffic Analysis and Secure E-voting
- Natural Language Processing Techniques
- Flood Risk Assessment and Management
- Hydrological Forecasting Using AI
- Census and Population Estimation
- Multimodal Machine Learning Applications
- Model Reduction and Neural Networks
- Time Series Analysis and Forecasting
Pennsylvania State University
2016-2025
United States Census Bureau
2018-2025
Yahoo (United States)
2009-2018
Adobe Systems (United States)
2017
Park University
2015
University of Waterloo
2013
Cornell University
2002-2011
Yahoo (Spain)
2008
Publishing data about individuals without revealing sensitive information them is an important problem. In recent years, a new definition of privacy called k -anonymity has gained popularity. -anonymized dataset, each record indistinguishable from at least − 1 other records with respect to certain identifying attributes. this article, we show using two simple attacks that dataset some subtle but severe problems. First, attacker can discover the values attributes when there little diversity...
Publishing data about individuals without revealing sensitive information them is an important problem. In recent years, a new definition of privacy called \kappa-anonymity has gained popularity. \kappa-anonymized dataset, each record indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. this paper we show two simple attacks that dataset some subtle, but severe problems. First, attacker can discover the values attributes when there little...
Differential privacy is a powerful tool for providing privacy-preserving noisy query answers over statistical databases. It guarantees that the distribution of changes very little with addition or deletion any tuple. frequently accompanied by popularized claims it provides without assumptions about data and protects against attackers who know all but one record. In this paper we critically analyze protections offered differential privacy.
In this paper, we propose the first formal privacy analysis of a data anonymization process known as synthetic generation, technique becoming popular in statistics community. The target application for work is mapping program that shows commuting patterns population United States. source were collected by U.S. Census Bureau, but due to constraints, they cannot be used directly program. Instead, generate statistically mimic original while providing guarantees. We use these surrogate data....
When you write papers, how many times do want to make some citations at a place but are not sure which papers cite? Do wish have recommendation system can recommend small number of good candidates for every that citations? In this paper, we present our initiative building context-aware citation system. High quality is challenging: only should the recommended be relevant paper under composition, also match local contexts places made. Moreover, it far from trivial model topic whole and affect...
Abstract. Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications generating new improved capabilities for scientific discovery model building. The adoption of DL in hydrology so far been gradual, but the field is now ripe breakthroughs. This paper suggests that DL-based methods can open up complementary avenue toward knowledge hydrologic sciences. In avenue, machine-learning algorithms present competing hypotheses are consistent with...
The Soil Moisture Active Passive (SMAP) mission has delivered valuable sensing of surface soil moisture since 2015. However, it a short time span and irregular revisit schedule. Utilizing state-of-the-art time-series deep learning neural network, Long Short-Term Memory (LSTM), we created system that predicts SMAP level-3 data with atmospheric forcing, model-simulated moisture, static physiographic attributes as inputs. removes most the bias model simulations improves predicted climatology,...
We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. consider structure extraction as a pixel-wise segmentation task, and propose unified model that classifies pixels based not only on their visual appearance, in the traditional page but also content of underlying text. Moreover, we efficient synthetic generation process use to generate pretraining data our network. Once is trained large set documents, fine-tune unlabeled...
In this article, we introduce a new and general privacy framework called Pufferfish. The Pufferfish can be used to create definitions that are customized the needs of given application. goal is allow experts in an application domain, who frequently do not have expertise privacy, develop rigorous for their data sharing needs. addition this, also study existing definitions. We illustrate benefits with several applications framework: use it analyze differential formalize connection attackers...
We present a robust end-to-end neural-based model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed the previous literature. Previous research reading often works with regular (horizontal and frontal) does adequately generalize processing perspective distortion curving effects. Our work proposes overcome this difficulty by introducing two learning components:...
Crime is one of the most important social problems in country, affecting public safety, children development, and adult socioeconomic status. Understanding what factors cause higher crime critical for policy makers their efforts to reduce increase citizens' life quality. We tackle a fundamental problem our paper: rate inference at neighborhood level. Traditional approaches have used demographics geographical influences estimate rates region. With fast development positioning technology...
We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work short-term prediction requiring significantly less computation. Key aspects of our proposed system include: 1) a novel, two-level processing architecture that aids generating planned trajectories, 2) simple set easily computable features integrate derivative information, 3) multi-objective loss...
Abstract When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice stratify large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that built for separately will have higher performance because of homogeneity within region. However, stratified model has access fewer less diverse data points. Here, through two hydrologic examples (soil moisture streamflow), we show conventional...
Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but dataset should still useful for studying the characteristics of population. Privacy requirements such as k-anonymity l-diversity are designed to thwart attacks that attempt identify discover their sensitive information. On other hand, utility has been well-studied.In this paper we will discuss shortcomings current heuristic approaches...
Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, practice, publisher does not know what attacker possesses. Thus, it is important to consider worst-case. In this paper, we initiate a formal study worst-case knowledge. We propose language that can express any data. provide polynomial time algorithm measure amount disclosure sensitive information worst case, given at most k pieces language. also...
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti's theorem. We illustrate usefulness technique by it to attack popular data sanitization scheme known as Anatomy. stress that Anatomy is not only vulnerable attack. fact, any uses random worlds model, i.i.d. or tuple-independent model needs be re-evaluated.
In this paper we introduce a new and general privacy framework called Pufferfish. The Pufferfish can be used to create definitions that are customized the needs of given application. goal is allow experts in an application domain, who frequently do not have expertise privacy, develop rigorous for their data sharing needs. addition this, also study existing definitions.
Iterative algorithms, like gradient descent, are common tools for solving a variety of problems, such as model fitting. For this reason, there is interest in creating differentially private versions them. However, their conversion to algorithms often naive. instance, fixed number iterations chosen, the privacy budget split evenly among them, and at each iteration, parameters updated with noisy gradient.
Automatic recommendation of citations for a manuscript is highly valuable scholarly activities since it can substantially improve the efficiency and quality literature search. The prior techniques placed considerable burden on users, who were required to provide representative bibliography or mark passages where are needed. In this paper we present system that considerably reduces burden: user simply inputs query (without bibliography) our automatically finds locations We show naïve...
The increased availability of large-scale trajectory data provides rich information for the study urban dynamics. For example, New York City Taxi 8 Limousine Commission regularly releases source/destination taxi trips, where 173 million trips released Year 2013 [29]. Such a big dataset us potential new perspectives to address traditional traffic problems. In this article, we travel time estimation problem. Instead following route-based estimation, propose simply use large amount without...
Page segmentation and table detection play an important role in understanding the structure of documents. We present a page algorithm that incorporates state-of-the-art deep learning methods for segmenting three types document elements: text blocks, tables, figures. propose multi-scale, multi-task fully convolutional neural network (FCN) tasks semantic element contour detection. The accurately predicts probability at each pixel classes. instance level "edges" around occurrence. conditional...