NFDI4DS | UHH-SEMS - Publication Details

Data streaming algorithms for estimating entropy of network traffic

OPENALEX - Publications

Ashwin Lall Vyas Sekar Mitsunori Ogihara Jun Xu Hui Zhang

Using entropy of traffic distributions has been shown to aid a wide variety network monitoring applications such as anomaly detection, clustering reveal interesting patterns, and classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU memory requirements. In paper, we investigate the problem estimating streaming computation model. We give lower bounds for problem, showing neither approximation nor...

10.1145/1140103.1140295 article EN ACM SIGMETRICS Performance Evaluation Review 2006-06-26

Regret-minimizing representative databases

OPENALEX - Publications

Danupon Nanongkai Atish Das Sarma Ashwin Lall Richard J. Lipton Jun Xu

We propose the k -representative regret minimization query ( -regret) as an operation to support multi-criteria decision making. Like top- , -regret assumes that users have some utility or scoring functions; however, it never asks provide such functions. skyline, filters out a set of interesting points from potentially large database based on users' criteria; overwhelms by outputting too many tuples. In particular, for any number and class functions, outputs tuples tries minimize maximum...

10.14778/1920841.1920980 article EN Proceedings of the VLDB Endowment 2010-09-01

Interactive regret minimization

OPENALEX - Publications

Danupon Nanongkai Ashwin Lall Atish Das Sarma Kazuhisa Makino

We study the notion of regret ratio proposed in [19] Nanongkai et al. [VLDB10] to deal with multi-criteria decision making database systems. The minimization query was shown have features both skyline and top-k: it does not need information from user but still controls output size. While this approach is suitable for obtaining a reasonably small ratio, open whether one can make arbitrarily small. Moreover, remains reasonable questions be asked users order improve efficiency process.

10.1145/2213836.2213850 article EN 2012-05-20

Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality

OPENALEX - Publications

Min Xie Raymond Chi-Wing Wong Jian Li Cheng Long Ashwin Lall

Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed the literature: top- k and skyline queries. A query requires users to specify their utility functions beforehand then returns users. does not require any function but it puts no control on number of returned Recently, k-regret was received attention community because output size controllable, thus avoids those deficiencies Specifically, that...

10.1145/3183713.3196903 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Data streaming algorithms for estimating entropy of network traffic

OPENALEX - Publications

Ashwin Lall Vyas Sekar Mitsunori Ogihara Jun Xu Hui Zhang

Using entropy of traffic distributions has been shown to aid a wide variety network monitoring applications such as anomaly detection, clustering reveal interesting patterns, and classification. However, realizing this potential benefit in practice requires accurate algorithms that can operate on high-speed links, with low CPU memory requirements. In paper, we investigate the problem estimating streaming computation model. We give lower bounds for problem, showing neither approximation nor...

10.1145/1140277.1140295 article EN 2006-06-26

A data streaming algorithm for estimating entropies of od flows

OPENALEX - Publications

Haiquan Zhao Ashwin Lall Mitsunori Ogihara Oliver Spatscheck Jia Wang and 1 more

Entropy has recently gained considerable significance as an important metric for network measurement. Previous research shown its utility in clustering traffic and detecting anomalies. While measuring the entropy of observed at a single point already been studied, interesting open problem is to measure between every origin-destination pair. In this paper, we propose first solution challenging problem. Our sketch builds upon extends Lp Indyk with significant additional innovations. We present...

10.1145/1298306.1298345 article EN 2007-10-24

Representative skylines using threshold-based preference distributions

OPENALEX - Publications

Atish Das Sarma Ashwin Lall Danupon Nanongkai Richard J. Lipton Jim Xu

The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets most interesting (undominated) tuples a database. However, since the skyline is often very large, much research effort been devoted to identifying smaller subset (say k) "representative skyline" points. Several different definitions representative have considered. Most these formulations intuitive that they try achieve some kind clustering "spread" over entire skyline,...

10.1109/icde.2011.5767873 article EN 2011-04-01

An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query

OPENALEX - Publications

Min Xie Raymond Chi-Wing Wong Ashwin Lall

10.1007/s00778-019-00570-z article EN The VLDB Journal 2019-09-14

Strongly Truthful Interactive Regret Minimization

OPENALEX - Publications

Min Xie Raymond Chi-Wing Wong Ashwin Lall

When faced with a database containing millions of tuples, an end user might be only interested in finding his/her (close to) favorite tuple the database. Recently, regret minimization query was proposed to obtain small subset from that fits user's needs, which are expressed through unknown utility function. Specifically, it minimizes "regret'' level user, we quantify as ratio if s/he gets best selected but not among all tuples We study how enhance interactions : when presented number (which...

10.1145/3299869.3300068 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

k -regret queries with nonlinear utilities

OPENALEX - Publications

Taylor Kessler Faulkner Will Brackenbury Ashwin Lall

In exploring representative databases, a primary issue has been finding accurate models of user preferences. Given this, our work generalizes the method regret minimization as proposed by Nanongkai et al. to include nonlinear utility functions. Regret is an approach for selecting k points from database such that every user's ideal point in entire similar one points. This combines benefits methods top- and skyline; it controls size output but does not require knowledge users' Prior with...

10.14778/2831360.2831364 article EN Proceedings of the VLDB Endowment 2015-09-01

Exploring Gradient Descent Optimization Algorithms for Early-Stage Training in Machine Learning Models

OPENALEX - Publications

Kristi Läll Ashwin Lall

10.2139/ssrn.5082092 preprint EN 2025-01-01

Validating Image Captioning Models Using Text-to-Image Algorithms via Generative AI

OPENALEX - Publications

Kristi Läll Ashwin Lall

10.1109/icccit62592.2025.10928102 article EN 2025-02-07

Advancing Paraphrase Generation through Deep Reinforcement Learning

OPENALEX - Publications

Kristi Läll Ashwin Lall

10.1109/icccit62592.2025.10927955 article EN 2025-02-07

Randomized multi-pass streaming skyline algorithms

OPENALEX - Publications

Atish Das Sarma Ashwin Lall Danupon Nanongkai Jun Xu

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due the nature of disks, it desirable that such access input as stream (even if in multiple passes). Using tools randomness, proved be useful many applications, we present efficient multi-pass streaming algorithm, RAND, computation. As far are aware, RAND first randomized literature. near-optimal model, which...

10.14778/1687627.1687638 article EN Proceedings of the VLDB Endowment 2009-08-01

Data streaming algorithms for the Kolmogorov-Smirnov test

OPENALEX - Publications

Ashwin Lall

We propose space-efficient algorithms for performing the Kolmogorov-Smirnov test on streaming data. The is a non-parametric measuring strength of hypothesis that some data drawn from fixed distribution (one-sample test), or two sets are same (two-sample test). Unlike other tests, does not assume has known form (e.g., it normal), and in two-sample case need know anything about distribution, than continuous. Motivated by challenges big data, we present both one-sample tests processed stream....

10.1109/bigdata.2015.7363746 article EN 2021 IEEE International Conference on Big Data (Big Data) 2015-10-01

Global iceberg detection over distributed data streams

OPENALEX - Publications

Haiquan Zhao Ashwin Lall Mitsunori Ogihara Jun Xu

In today's Internet applications or sensor networks we often encounter large amounts of data spread over many physically distributed nodes. The sheer volume the and bandwidth constraints make it impractical to send all one central node for query processing. Finding icebergs—elements that may have low frequency at individual nodes but high aggregate frequency—is a problem arises commonly in practice. this paper present novel algorithm with two notable properties. First, its accuracy guarantee...

10.1109/icde.2010.5447825 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2010-01-01

Exponential Reservoir Sampling for Streaming Language Models

OPENALEX - Publications

Miles Osborne Ashwin Lall Benjamin Van Durme

We show how rapidly changing textual streams such as Twitter can be modelled in fixed space.Our approach is based upon a randomised algorithm called Exponential Reservoir Sampling, unexplored by this community until now.Using language models over and Newswire testbed, our experimental results on perplexity support the intuition that recently observed data generally outweighs seen past, but at times, past have valuable signals enabling better modelling of present.

10.3115/v1/p14-2112 article EN cc-by 2014-01-01

Social Network Monetization via Sponsored Viral Marketing

OPENALEX - Publications

Parinya Chalermsook Atish Das Sarma Ashwin Lall Danupon Nanongkai

Viral marketing is a powerful tool for online advertising and sales because it exploits the influence people have on one another. While this technique has been beneficial advertisers, not shown how social network providers such as Facebook Twitter can benefit from it. In paper, we initiate study of sponsored viral where provider that complete knowledge its hired by several advertisers to provide marketing. Each advertiser own budget fixed amount they are willing pay each user adopts their...

10.1145/2745844.2745853 article EN 2015-06-08

A simpler and better design of error estimating coding

OPENALEX - Publications

Nan Hua Ashwin Lall Baochun Li Jun Xu

We study error estimating codes with the goal of establishing better bounds for theoretical and empirical overhead such schemes. explore idea using sketch data structures this problem, show that tug-of-war gives an asymptotically optimal solution. The optimality our algorithms are proved communication complexity lower bound techniques. then propose a novel enhancement greatly reduces realistic rates. Our analysis assertions supported by extensive experimental evaluation.

10.1109/infcom.2012.6195624 article EN 2012-03-01

Uncovering Global Icebergs in Distributed Streams: Results and Implications

OPENALEX - Publications

Guanyao Huang Ashwin Lall Chen‐Nee Chuah Jun Xu

Discovering icebergs in distributed streams of data is an important problem for a number applications networking and databases. While previous work has concentrated on measuring these the non-distributed streaming case or non-streaming case, we present general framework that allows processing across multiple data. We compare several state-of-the-art algorithms estimating local elephants individual streams. However, since iceberg may be hidden by being many different streams, add sampling...

10.1007/s10922-010-9186-5 article EN cc-by-nc Journal of Network and Systems Management 2010-10-23

Social Network Monetization via Sponsored Viral Marketing

OPENALEX - Publications

Parinya Chalermsook Atish Das Sarma Ashwin Lall Danupon Nanongkai

Viral marketing is a powerful tool for online advertising and sales because it exploits the influence people have on one another. While this technique has been beneficial advertisers, not shown how social network providers such as Facebook Twitter can benefit from it. In paper, we initiate study of sponsored viral where provider that complete knowledge its hired by several advertisers to provide marketing. Each advertiser own budget fixed amount they are willing pay each user adopts their...

10.1145/2796314.2745853 article EN ACM SIGMETRICS Performance Evaluation Review 2015-06-15