Parag Agrawal

ORCID: 0009-0005-0759-8484
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Topic Modeling
  • Semantic Web and Ontologies
  • Data Quality and Management
  • Natural Language Processing Techniques
  • Misinformation and Its Impacts
  • Cloud Computing and Resource Management
  • Parallel Computing and Optimization Techniques
  • Service-Oriented Architecture and Web Services
  • AI in Service Interactions
  • Web Data Mining and Analysis
  • Advanced Data Storage Technologies
  • Scientific Computing and Data Management
  • Sentiment Analysis and Opinion Mining
  • Software System Performance and Reliability
  • Recommender Systems and Techniques
  • Digital Marketing and Social Media
  • Data Mining Algorithms and Applications
  • Caching and Content Delivery
  • Web Application Security Vulnerabilities
  • Software Engineering Research
  • Real-Time Systems Scheduling
  • Genetic and Kidney Cyst Diseases
  • Infectious Diseases and Mycology

Index Medical College, Hospital & Research Centre
2024

Tata Consultancy Services (India)
2021-2023

LinkedIn (United States)
2019-2023

Poornima University
2021

Microsoft (Finland)
2020

Microsoft Research (United Kingdom)
2019-2020

Microsoft (United States)
2018

Microsoft (India)
2017

Twitter (United States)
2014

Stanford University
2006-2011

Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped access latency bandwidth. This paper argues for a new approach datacenter called RAMCloud, where information is kept entirely DRAM systems created by aggregating main memories thousands commodity servers. We believe that RAMClouds can provide durable available with 100-1000x...

10.1145/1713254.1713276 article EN ACM SIGOPS Operating Systems Review 2010-01-27

In the Trio project at Stanford, we are building a new kind of database management system: one in which data, uncertainty and data lineage all first-class citizens. is based on an extended relational model called ULDBs, it supports SQL-based query language TriQL. was motivated by number applications including scientific management, cleaning integration, information extraction systems, others. We have completed initial working prototype system. will demonstrate our illustrating through two...

10.5555/1182635.1164231 article EN 2006-09-01

With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications.

10.1145/1965724.1965751 article EN Communications of the ACM 2011-06-28

In this paper, we solve the following data summarization problem: given a multi-dimensional set augmented with binary attribute, how can construct an interpretable and informative summary of factors affecting attribute in terms combinations values dimension attributes? We refer to such summaries as explanation tables. show hardness constructing optimally-informative tables from data, propose effective efficient heuristics. The proposed heuristics are based on sampling include optimizations...

10.14778/2735461.2735467 article EN Proceedings of the VLDB Endowment 2014-09-01

We study how best to schedule scans of large data files, in the presence many simultaneous requests a common set files. The objective is maximize overall rate processing these by sharing same file as aggressively possible, without imposing undue wait time on individual jobs. This scheduling problem arises batch environments such Map-Reduce systems, some which handle tens thousands daily, over shared As we demonstrate, conventional techniques shortest-job-first do not perform well cross-job...

10.14778/1453856.1453960 article EN Proceedings of the VLDB Endowment 2008-08-01

The query models of the recent generation very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups scans trading expressiveness for massive scale. Indexes views can expand such systems by materializing more complex access paths results. In this paper, we examine mechanisms to implement indexes in a database. For web applications, minimizing update...

10.1145/1559845.1559866 article EN 2009-06-29

Prior work has identified set based comparisons as a useful primitive for supporting wide variety of similarity functions in record matching. Accordingly, various techniques have been proposed to improve the performance lookups. However, this body focuses almost exclusively on symmetric notions similarity. In paper, we study indexing problem asymmetric Jaccard containment function that is an error-tolerant variation containment. We enhance also account string transformations reflect synonyms...

10.1145/1807167.1807267 article EN 2010-06-06

There has been considerable past work studying data integration and uncertain in isolation. We develop the foundations for local-as-view (LAV) when sources being integrated are uncertain. motivate two distinct settings uncertain-data integration. then define containment of databases these settings, which allows us to express as views over a virtual mediated database. Next, we consistency set show intractability consistency-checking. identify an interesting special case consistency-checking...

10.14778/1920841.1920976 article EN Proceedings of the VLDB Endowment 2010-09-01

Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability systems robustly capture these covariates puts a cap their performance. We propose NELEC: Neural and Lexical Combiner, system which elegantly combines textual based methods for sentiment...

10.18653/v1/s19-2045 article EN cc-by 2019-01-01

In uncertain and probabilistic databases, <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">confidence</i> values ( xmlns:xlink="http://www.w3.org/1999/xlink">or</i> xmlns:xlink="http://www.w3.org/1999/xlink">probabilities</i> ) are associated with each data item. Confidence assigned to query results based on combining confidences from the input data. Users may wish apply a threshold result confidence values, ask for "top-k" by confidence, or obtain...

10.1109/icde.2009.141 article EN Proceedings - International Conference on Data Engineering 2009-03-01

<i>Mycobacterium</i> <i>chelonae</i> is a rapidly growing mycobacterium that found all over the environment, including sewage and tap water. They are important species associated with chronic non-healing wounds. We report case in 41 year old female patient who underwent multiple surgeries for an ovarian cyst, tubo-ovarian abscesses peritonitis repair of abdominal incisional hernia.<br>

10.4103/0377-4929.134736 article EN cc-by-nc-sa Indian Journal of Pathology and Microbiology 2014-01-01

Knowledge of strengths and weaknesses players is the key for team selection strategy planning in any sport such as Cricket. Computationally, this problem mostly unexplored. Existing methods focus only on aggregate macroscopic statistics that ignore many details. The central idea our paper to mine strength weakness rules using short text commentary data. This dataset compact, semi-structured, accurate, yet ignored by machine learning community until now. We collect fine-grained information...

10.1109/icmla.2019.00122 article EN 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2019-12-01

(i) (ii) (is) In the integrated strategy, one component decides which relation should remain fragmented at different sites. The other local operations, selections and projections, be performed before join operations. Our experimental results reveal that choices made by algorithm in deciding operations to are valid. More precisely , response times of queries processed lower than those same using strategies. These agree with analytic cost model we have previously proposed [YGC87]. addition,...

10.5555/62597.62612 article EN International Symposium on Databases for Parallel and Distributed Systems 1988-12-05

Having a bot for seamless conversations is much-desired feature that products and services today seek their websites mobile apps. These bots help reduce traffic received by human support significantly handling frequent directly answerable known questions. Many such have huge reference documents as FAQ pages, which makes it hard users to browse through this data. A conversation layer over raw data can lower great margin. We demonstrate QnAMaker, service creates conversational semi-structured...

10.1145/3366424.3383525 article EN Companion Proceedings of the The Web Conference 2018 2020-04-20

Introduction: Papillary thyroid carcinoma is the most common pediatric malignancy representing 85–95 % of cases. Pediatric cancers typically present as neck masses with no associated symptoms and thus come to medical attention at widely varying stages disease progression. In contrast adult PTC, PTC tends be more aggressive presentation higher incidence multifocality, extracapsular extension Lymph node distant metastasis. Although a lifetime recurrence rate high, mortality rates are still...

10.4103/trp.trp_16_24 article EN Thyroid Research and Practice 2024-05-01

Social networks are platforms where content creators and consumers share consume content. The edge recommendation system, which determines who a member should connect with, significantly impacts the reach engagement of audience on such networks. This paper emphasizes improving experience inactive members (IMs) do not have large connection network by recommending better connections. To that end, we propose multi-objective linear optimization framework solve it using accelerated gradient...

10.1145/3543873.3587647 article EN 2023-04-28
Coming Soon ...