- Advanced Database Systems and Queries
- Data Management and Algorithms
- Topic Modeling
- Semantic Web and Ontologies
- Data Quality and Management
- Natural Language Processing Techniques
- Misinformation and Its Impacts
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Service-Oriented Architecture and Web Services
- AI in Service Interactions
- Web Data Mining and Analysis
- Advanced Data Storage Technologies
- Scientific Computing and Data Management
- Sentiment Analysis and Opinion Mining
- Software System Performance and Reliability
- Recommender Systems and Techniques
- Digital Marketing and Social Media
- Data Mining Algorithms and Applications
- Caching and Content Delivery
- Web Application Security Vulnerabilities
- Software Engineering Research
- Real-Time Systems Scheduling
- Genetic and Kidney Cyst Diseases
- Infectious Diseases and Mycology
Index Medical College, Hospital & Research Centre
2024
Tata Consultancy Services (India)
2021-2023
LinkedIn (United States)
2019-2023
Poornima University
2021
Microsoft (Finland)
2020
Microsoft Research (United Kingdom)
2019-2020
Microsoft (United States)
2018
Microsoft (India)
2017
Twitter (United States)
2014
Stanford University
2006-2011
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped access latency bandwidth. This paper argues for a new approach datacenter called RAMCloud, where information is kept entirely DRAM systems created by aggregating main memories thousands commodity servers. We believe that RAMClouds can provide durable available with 100-1000x...
In the Trio project at Stanford, we are building a new kind of database management system: one in which data, uncertainty and data lineage all first-class citizens. is based on an extended relational model called ULDBs, it supports SQL-based query language TriQL. was motivated by number applications including scientific management, cleaning integration, information extraction systems, others. We have completed initial working prototype system. will demonstrate our illustrating through two...
With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications.
In this paper, we solve the following data summarization problem: given a multi-dimensional set augmented with binary attribute, how can construct an interpretable and informative summary of factors affecting attribute in terms combinations values dimension attributes? We refer to such summaries as explanation tables. show hardness constructing optimally-informative tables from data, propose effective efficient heuristics. The proposed heuristics are based on sampling include optimizations...
We study how best to schedule scans of large data files, in the presence many simultaneous requests a common set files. The objective is maximize overall rate processing these by sharing same file as aggressively possible, without imposing undue wait time on individual jobs. This scheduling problem arises batch environments such Map-Reduce systems, some which handle tens thousands daily, over shared As we demonstrate, conventional techniques shortest-job-first do not perform well cross-job...
The query models of the recent generation very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups scans trading expressiveness for massive scale. Indexes views can expand such systems by materializing more complex access paths results. In this paper, we examine mechanisms to implement indexes in a database. For web applications, minimizing update...
Prior work has identified set based comparisons as a useful primitive for supporting wide variety of similarity functions in record matching. Accordingly, various techniques have been proposed to improve the performance lookups. However, this body focuses almost exclusively on symmetric notions similarity. In paper, we study indexing problem asymmetric Jaccard containment function that is an error-tolerant variation containment. We enhance also account string transformations reflect synonyms...
There has been considerable past work studying data integration and uncertain in isolation. We develop the foundations for local-as-view (LAV) when sources being integrated are uncertain. motivate two distinct settings uncertain-data integration. then define containment of databases these settings, which allows us to express as views over a virtual mediated database. Next, we consistency set show intractability consistency-checking. identify an interesting special case consistency-checking...
Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability systems robustly capture these covariates puts a cap their performance. We propose NELEC: Neural and Lexical Combiner, system which elegantly combines textual based methods for sentiment...
In uncertain and probabilistic databases, <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">confidence</i> values ( xmlns:xlink="http://www.w3.org/1999/xlink">or</i> xmlns:xlink="http://www.w3.org/1999/xlink">probabilities</i> ) are associated with each data item. Confidence assigned to query results based on combining confidences from the input data. Users may wish apply a threshold result confidence values, ask for "top-k" by confidence, or obtain...
<i>Mycobacterium</i> <i>chelonae</i> is a rapidly growing mycobacterium that found all over the environment, including sewage and tap water. They are important species associated with chronic non-healing wounds. We report case in 41 year old female patient who underwent multiple surgeries for an ovarian cyst, tubo-ovarian abscesses peritonitis repair of abdominal incisional hernia.<br>
Knowledge of strengths and weaknesses players is the key for team selection strategy planning in any sport such as Cricket. Computationally, this problem mostly unexplored. Existing methods focus only on aggregate macroscopic statistics that ignore many details. The central idea our paper to mine strength weakness rules using short text commentary data. This dataset compact, semi-structured, accurate, yet ignored by machine learning community until now. We collect fine-grained information...
(i) (ii) (is) In the integrated strategy, one component decides which relation should remain fragmented at different sites. The other local operations, selections and projections, be performed before join operations. Our experimental results reveal that choices made by algorithm in deciding operations to are valid. More precisely , response times of queries processed lower than those same using strategies. These agree with analytic cost model we have previously proposed [YGC87]. addition,...
Having a bot for seamless conversations is much-desired feature that products and services today seek their websites mobile apps. These bots help reduce traffic received by human support significantly handling frequent directly answerable known questions. Many such have huge reference documents as FAQ pages, which makes it hard users to browse through this data. A conversation layer over raw data can lower great margin. We demonstrate QnAMaker, service creates conversational semi-structured...
Introduction: Papillary thyroid carcinoma is the most common pediatric malignancy representing 85–95 % of cases. Pediatric cancers typically present as neck masses with no associated symptoms and thus come to medical attention at widely varying stages disease progression. In contrast adult PTC, PTC tends be more aggressive presentation higher incidence multifocality, extracapsular extension Lymph node distant metastasis. Although a lifetime recurrence rate high, mortality rates are still...
Social networks are platforms where content creators and consumers share consume content. The edge recommendation system, which determines who a member should connect with, significantly impacts the reach engagement of audience on such networks. This paper emphasizes improving experience inactive members (IMs) do not have large connection network by recommending better connections. To that end, we propose multi-objective linear optimization framework solve it using accelerated gradient...