Daniel Kifer

ORCID: 0000-0002-4611-7066
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Privacy-Preserving Technologies in Data
  • Cryptography and Data Security
  • Stochastic Gradient Optimization Techniques
  • Privacy, Security, and Data Protection
  • Handwritten Text Recognition Techniques
  • Adversarial Robustness in Machine Learning
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Mobile Crowdsensing and Crowdsourcing
  • Anomaly Detection Techniques and Applications
  • Landslides and related hazards
  • Data-Driven Disease Surveillance
  • Topic Modeling
  • Neural Networks and Applications
  • Seismology and Earthquake Studies
  • Data Management and Algorithms
  • Complexity and Algorithms in Graphs
  • Internet Traffic Analysis and Secure E-voting
  • Natural Language Processing Techniques
  • Flood Risk Assessment and Management
  • Hydrological Forecasting Using AI
  • Census and Population Estimation
  • Multimodal Machine Learning Applications
  • Model Reduction and Neural Networks
  • Time Series Analysis and Forecasting

Pennsylvania State University
2016-2025

United States Census Bureau
2018-2025

Yahoo (United States)
2009-2018

Adobe Systems (United States)
2017

Park University
2015

University of Waterloo
2013

Cornell University
2002-2011

Yahoo (Spain)
2008

Publishing data about individuals without revealing sensitive information them is an important problem. In recent years, a new definition of privacy called k -anonymity has gained popularity. -anonymized dataset, each record indistinguishable from at least − 1 other records with respect to certain identifying attributes. this article, we show using two simple attacks that dataset some subtle but severe problems. First, attacker can discover the values attributes when there little diversity...

10.1145/1217299.1217302 article EN ACM Transactions on Knowledge Discovery from Data 2007-03-01

Publishing data about individuals without revealing sensitive information them is an important problem. In recent years, a new definition of privacy called \kappa-anonymity has gained popularity. \kappa-anonymized dataset, each record indistinguishable from at least k—1 other records with respect to certain "identifying" attributes. this paper we show two simple attacks that dataset some subtle, but severe problems. First, attacker can discover the values attributes when there little...

10.1109/icde.2006.1 article EN 2006-01-01

Differential privacy is a powerful tool for providing privacy-preserving noisy query answers over statistical databases. It guarantees that the distribution of changes very little with addition or deletion any tuple. frequently accompanied by popularized claims it provides without assumptions about data and protects against attackers who know all but one record. In this paper we critically analyze protections offered differential privacy.

10.1145/1989323.1989345 article EN 2011-06-12

In this paper, we propose the first formal privacy analysis of a data anonymization process known as synthetic generation, technique becoming popular in statistics community. The target application for work is mapping program that shows commuting patterns population United States. source were collected by U.S. Census Bureau, but due to constraints, they cannot be used directly program. Instead, generate statistically mimic original while providing guarantees. We use these surrogate data....

10.1109/icde.2008.4497436 article EN 2008-04-01

When you write papers, how many times do want to make some citations at a place but are not sure which papers cite? Do wish have recommendation system can recommend small number of good candidates for every that citations? In this paper, we present our initiative building context-aware citation system. High quality is challenging: only should the recommended be relevant paper under composition, also match local contexts places made. Moreover, it far from trivial model topic whole and affect...

10.1145/1772690.1772734 article EN 2010-04-26

Abstract. Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications generating new improved capabilities for scientific discovery model building. The adoption of DL in hydrology so far been gradual, but the field is now ripe breakthroughs. This paper suggests that DL-based methods can open up complementary avenue toward knowledge hydrologic sciences. In avenue, machine-learning algorithms present competing hypotheses are consistent with...

10.5194/hess-22-5639-2018 article EN cc-by Hydrology and earth system sciences 2018-11-01

The Soil Moisture Active Passive (SMAP) mission has delivered valuable sensing of surface soil moisture since 2015. However, it a short time span and irregular revisit schedule. Utilizing state-of-the-art time-series deep learning neural network, Long Short-Term Memory (LSTM), we created system that predicts SMAP level-3 data with atmospheric forcing, model-simulated moisture, static physiographic attributes as inputs. removes most the bias model simulations improves predicted climatology,...

10.1002/2017gl075619 article EN Geophysical Research Letters 2017-10-16

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. consider structure extraction as a pixel-wise segmentation task, and propose unified model that classifies pixels based not only on their visual appearance, in the traditional page but also content of underlying text. Moreover, we efficient synthetic generation process use to generate pretraining data our network. Once is trained large set documents, fine-tune unlabeled...

10.1109/cvpr.2017.462 article EN 2017-07-01

In this article, we introduce a new and general privacy framework called Pufferfish. The Pufferfish can be used to create definitions that are customized the needs of given application. goal is allow experts in an application domain, who frequently do not have expertise privacy, develop rigorous for their data sharing needs. addition this, also study existing definitions. We illustrate benefits with several applications framework: use it analyze differential formalize connection attackers...

10.1145/2514689 article EN ACM Transactions on Database Systems 2014-01-01

We present a robust end-to-end neural-based model to attentively recognize text in natural images. Particularly, we focus on accurately identifying irregular (perspectively distorted or curved) text, which has not been well addressed the previous literature. Previous research reading often works with regular (horizontal and frontal) does adequately generalize processing perspective distortion curving effects. Our work proposes overcome this difficulty by introducing two learning components:...

10.24963/ijcai.2017/458 article EN 2017-07-28

Crime is one of the most important social problems in country, affecting public safety, children development, and adult socioeconomic status. Understanding what factors cause higher crime critical for policy makers their efforts to reduce increase citizens' life quality. We tackle a fundamental problem our paper: rate inference at neighborhood level. Traditional approaches have used demographics geographical influences estimate rates region. With fast development positioning technology...

10.1145/2939672.2939736 article EN 2016-08-08

We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work short-term prediction requiring significantly less computation. Key aspects of our proposed system include: 1) a novel, two-level processing architecture that aids generating planned trajectories, 2) simple set easily computable features integrate derivative information, 3) multi-objective loss...

10.1109/cvpr.2019.01239 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Abstract When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice stratify large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that built for separately will have higher performance because of homogeneity within region. However, stratified model has access fewer less diverse data points. Here, through two hydrologic examples (soil moisture streamflow), we show conventional...

10.1029/2021wr029583 article EN publisher-specific-oa Water Resources Research 2022-03-17

Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but dataset should still useful for studying the characteristics of population. Privacy requirements such as k-anonymity l-diversity are designed to thwart attacks that attempt identify discover their sensitive information. On other hand, utility has been well-studied.In this paper we will discuss shortcomings current heuristic approaches...

10.1145/1142473.1142499 article EN 2006-06-27

Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, practice, publisher does not know what attacker possesses. Thus, it is important to consider worst-case. In this paper, we initiate a formal study worst-case knowledge. We propose language that can express any data. provide polynomial time algorithm measure amount disclosure sensitive information worst case, given at most k pieces language. also...

10.1109/icde.2007.367858 article EN 2007-04-01

In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti's theorem. We illustrate usefulness technique by it to attack popular data sanitization scheme known as Anatomy. stress that Anatomy is not only vulnerable attack. fact, any uses random worlds model, i.i.d. or tuple-independent model needs be re-evaluated.

10.1145/1559845.1559861 article EN 2009-06-29

In this paper we introduce a new and general privacy framework called Pufferfish. The Pufferfish can be used to create definitions that are customized the needs of given application. goal is allow experts in an application domain, who frequently do not have expertise privacy, develop rigorous for their data sharing needs. addition this, also study existing definitions.

10.1145/2213556.2213571 article EN 2012-05-21

Iterative algorithms, like gradient descent, are common tools for solving a variety of problems, such as model fitting. For this reason, there is interest in creating differentially private versions them. However, their conversion to algorithms often naive. instance, fixed number iterations chosen, the privacy budget split evenly among them, and at each iteration, parameters updated with noisy gradient.

10.1145/3219819.3220076 article EN 2018-07-19

Automatic recommendation of citations for a manuscript is highly valuable scholarly activities since it can substantially improve the efficiency and quality literature search. The prior techniques placed considerable burden on users, who were required to provide representative bibliography or mark passages where are needed. In this paper we present system that considerably reduces burden: user simply inputs query (without bibliography) our automatically finds locations We show naïve...

10.1145/1935826.1935926 article EN 2011-02-01

The increased availability of large-scale trajectory data provides rich information for the study urban dynamics. For example, New York City Taxi 8 Limousine Commission regularly releases source/destination taxi trips, where 173 million trips released Year 2013 [29]. Such a big dataset us potential new perspectives to address traditional traffic problems. In this article, we travel time estimation problem. Instead following route-based estimation, propose simply use large amount without...

10.1145/3293317 article EN ACM Transactions on Intelligent Systems and Technology 2019-01-12

Page segmentation and table detection play an important role in understanding the structure of documents. We present a page algorithm that incorporates state-of-the-art deep learning methods for segmenting three types document elements: text blocks, tables, figures. propose multi-scale, multi-task fully convolutional neural network (FCN) tasks semantic element contour detection. The accurately predicts probability at each pixel classes. instance level "edges" around occurrence. conditional...

10.1109/icdar.2017.50 article EN 2017-11-01
Coming Soon ...