- Data Management and Algorithms
- Algorithms and Data Compression
- Advanced Database Systems and Queries
- Distributed systems and fault tolerance
- Caching and Content Delivery
- Internet Traffic Analysis and Secure E-voting
- Error Correcting Code Techniques
- Network Security and Intrusion Detection
- Topic Modeling
- Complexity and Algorithms in Graphs
- Data Stream Mining Techniques
- Advanced Image and Video Retrieval Techniques
- Optimization and Search Problems
- Constraint Satisfaction and Optimization
- Complex Network Analysis Techniques
- Water Quality Monitoring Technologies
- Network Packet Processing and Optimization
- DNA and Biological Computing
- Natural Language Processing Techniques
- Water Systems and Optimization
- Anomaly Detection Techniques and Applications
- Cloud Computing and Resource Management
- Advanced Wireless Communication Techniques
- semigroups and automata theory
- Human Mobility and Location-Based Analysis
University of Michigan
2025
Denison University
2011-2024
Indian Institute of Technology Hyderabad
2021-2024
Johns Hopkins University
2014
Georgia Institute of Technology
2009-2011
University of Rochester
2006-2008
Using entropy of traffic distributions has been shown to aid a wide variety network monitoring applications such as anomaly detection, clustering reveal interesting patterns, and classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU memory requirements. In paper, we investigate the problem estimating streaming computation model. We give lower bounds for problem, showing neither approximation nor...
We propose the k -representative regret minimization query ( -regret) as an operation to support multi-criteria decision making. Like top- , -regret assumes that users have some utility or scoring functions; however, it never asks provide such functions. skyline, filters out a set of interesting points from potentially large database based on users' criteria; overwhelms by outputting too many tuples. In particular, for any number and class functions, outputs tuples tries minimize maximum...
We study the notion of regret ratio proposed in [19] Nanongkai et al. [VLDB10] to deal with multi-criteria decision making database systems. The minimization query was shown have features both skyline and top-k: it does not need information from user but still controls output size. While this approach is suitable for obtaining a reasonably small ratio, open whether one can make arbitrarily small. Moreover, remains reasonable questions be asked users order improve efficiency process.
Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed the literature: top- k and skyline queries. A query requires users to specify their utility functions beforehand then returns users. does not require any function but it puts no control on number of returned Recently, k-regret was received attention community because output size controllable, thus avoids those deficiencies Specifically, that...
Using entropy of traffic distributions has been shown to aid a wide variety network monitoring applications such as anomaly detection, clustering reveal interesting patterns, and classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU memory requirements. In paper, we investigate the problem estimating streaming computation model. We give lower bounds for problem, showing neither approximation nor...
Entropy has recently gained considerable significance as an important metric for network measurement. Previous research shown its utility in clustering traffic and detecting anomalies. While measuring the entropy of observed at a single point already been studied, interesting open problem is to measure between every origin-destination pair. In this paper, we propose first solution challenging problem. Our sketch builds upon extends Lp Indyk with significant additional innovations. We present...
The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets most interesting (undominated) tuples a database. However, since the skyline is often very large, much research effort been devoted to identifying smaller subset (say k) "representative skyline" points. Several different definitions representative have considered. Most these formulations intuitive that they try achieve some kind clustering "spread" over entire skyline,...
When faced with a database containing millions of tuples, an end user might be only interested in finding his/her (close to) favorite tuple the database. Recently, regret minimization query was proposed to obtain small subset from that fits user's needs, which are expressed through unknown utility function. Specifically, it minimizes "regret'' level user, we quantify as ratio if s/he gets best selected but not among all tuples We study how enhance interactions : when presented number (which...
In exploring representative databases, a primary issue has been finding accurate models of user preferences. Given this, our work generalizes the method regret minimization as proposed by Nanongkai et al. to include nonlinear utility functions. Regret is an approach for selecting k points from database such that every user's ideal point in entire similar one points. This combines benefits methods top- and skyline; it controls size output but does not require knowledge users' Prior with...
We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due the nature of disks, it desirable that such access input as stream (even if in multiple passes). Using tools randomness, proved be useful many applications, we present efficient multi-pass streaming algorithm, RAND, computation. As far are aware, RAND first randomized literature. near-optimal model, which...
We propose space-efficient algorithms for performing the Kolmogorov-Smirnov test on streaming data. The is a non-parametric measuring strength of hypothesis that some data drawn from fixed distribution (one-sample test), or two sets are same (two-sample test). Unlike other tests, does not assume has known form (e.g., it normal), and in two-sample case need know anything about distribution, than continuous. Motivated by challenges big data, we present both one-sample tests processed stream....
In today's Internet applications or sensor networks we often encounter large amounts of data spread over many physically distributed nodes. The sheer volume the and bandwidth constraints make it impractical to send all one central node for query processing. Finding icebergs—elements that may have low frequency at individual nodes but high aggregate frequency—is a problem arises commonly in practice. this paper present novel algorithm with two notable properties. First, its accuracy guarantee...
We show how rapidly changing textual streams such as Twitter can be modelled in fixed space.Our approach is based upon a randomised algorithm called Exponential Reservoir Sampling, unexplored by this community until now.Using language models over and Newswire testbed, our experimental results on perplexity support the intuition that recently observed data generally outweighs seen past, but at times, past have valuable signals enabling better modelling of present.
Viral marketing is a powerful tool for online advertising and sales because it exploits the influence people have on one another. While this technique has been beneficial advertisers, not shown how social network providers such as Facebook Twitter can benefit from it. In paper, we initiate study of sponsored viral where provider that complete knowledge its hired by several advertisers to provide marketing. Each advertiser own budget fixed amount they are willing pay each user adopts their...
We study error estimating codes with the goal of establishing better bounds for theoretical and empirical overhead such schemes. explore idea using sketch data structures this problem, show that tug-of-war gives an asymptotically optimal solution. The optimality our algorithms are proved communication complexity lower bound techniques. then propose a novel enhancement greatly reduces realistic rates. Our analysis assertions supported by extensive experimental evaluation.
Discovering icebergs in distributed streams of data is an important problem for a number applications networking and databases. While previous work has concentrated on measuring these the non-distributed streaming case or non-streaming case, we present general framework that allows processing across multiple data. We compare several state-of-the-art algorithms estimating local elephants individual streams. However, since iceberg may be hidden by being many different streams, add sampling...
Viral marketing is a powerful tool for online advertising and sales because it exploits the influence people have on one another. While this technique has been beneficial advertisers, not shown how social network providers such as Facebook Twitter can benefit from it. In paper, we initiate study of sponsored viral where provider that complete knowledge its hired by several advertisers to provide marketing. Each advertiser own budget fixed amount they are willing pay each user adopts their...