Sudipto Das

ORCID: 0009-0007-6154-1504
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Distributed systems and fault tolerance
  • Cloud Computing and Resource Management
  • Advanced Database Systems and Queries
  • Advanced Data Storage Technologies
  • Rheumatoid Arthritis Research and Therapies
  • Systemic Lupus Erythematosus Research
  • Data Management and Algorithms
  • Distributed and Parallel Computing Systems
  • Autoimmune and Inflammatory Disorders Research
  • Privacy, Security, and Data Protection
  • Internet Traffic Analysis and Secure E-voting
  • Privacy-Preserving Technologies in Data
  • Data Mining Algorithms and Applications
  • Peer-to-Peer Network Technologies
  • Caching and Content Delivery
  • Spondyloarthritis Studies and Treatments
  • Robotic Path Planning Algorithms
  • Software System Performance and Reliability
  • Mobile Ad Hoc Networks
  • Data Quality and Management
  • Opportunistic and Delay-Tolerant Networks
  • Wireless Networks and Protocols
  • Mobile Agent-Based Network Management
  • Monoclonal and Polyclonal Antibodies Research
  • Immunodeficiency and Autoimmune Disorders

Amazon (United States)
2020-2023

Khulna University
2020-2021

Microsoft (United States)
2011-2019

Yale University
2019

Microsoft Research (United Kingdom)
2013-2018

Leidos (United States)
2018

Leidos Biomedical Research Inc. (United States)
2018

Frederick National Laboratory for Cancer Research
2018

Royal College of Surgeons in Ireland
2018

University of Leeds
2013-2016

Scalable database management systems (DBMS)---both for update intensive application workloads as well decision support descriptive and deep analytics---are a critical part of the cloud infrastructure play an important role in ensuring smooth transition applications from traditional enterprise infrastructures to next generation infrastructures. Though scalable data has been vision more than three decades much research focussed on large scale setting, computing brings its own set novel...

10.1145/1951365.1951432 article EN 2011-03-21

To monitor progression to inflammatory arthritis (IA) in individuals with non-specific musculoskeletal (MSK) symptoms and positive anticyclic citrullinated peptide (anti-CCP) antibodies. develop a pragmatic model predict development of IA this patient group.In prospective observational cohort, patients new MSK anti-CCP were recruited from regional primary care secondary referrals. Clinical, imaging serological parameters assessed at baseline. Cox regression analysis was performed identify...

10.1136/annrheumdis-2014-205227 article EN Annals of the Rheumatic Diseases 2014-04-12

Cloud computing has emerged as a preferred platform for deploying scalable web-applications. With the growing scale of these applications and data associated with them, management systems form crucial part cloud infrastructure. Key-Value stores -- such Bigtable, PNUTS, Dynamo, their open source analogues-- have been in cloud. In systems, is represented pairs, atomic access provided only at granularity single keys. While properties work well current applications, they are insufficient next...

10.1145/1807128.1807157 article EN 2010-06-10

The ubiquity of location enabled devices has resulted in a wide proliferation based applications and services. To handle the growing scale, database management systems driving such services (LBS) must cope with high insert rates for updates millions devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped multi-dimensional index structures, can efficiently spatio-temporal data. However, popular open source relational are overwhelmed by insertion...

10.1109/mdm.2011.41 article EN 2011-06-01

Multitenant data infrastructures for large cloud platforms hosting hundreds of thousands applications face the challenge serving characterized by small footprint and unpredictable load patterns. When such a platform is built on an elastic pay-per-use infrastructure, added to minimize system's operating cost while guaranteeing tenants' service level agreements (SLA). Elastic balancing therefore important feature enable scale-up during high scaling down when low. Live migration, technique...

10.1145/1989323.1989356 article EN 2011-06-12

Database systems serving cloud platforms must serve large numbers of applications (or tenants ). In addition to managing with small data footprints, different schemas, and variable load patterns, such multitenant minimize their operating costs by efficient resource sharing. When deployed over a pay-per-use infrastructure, elastic scaling balancing, enabled low cost live migration tenant databases, is critical tolerate variations while minimizing cost. However, existing databases---relational...

10.14778/2002974.2002977 article EN Proceedings of the VLDB Endowment 2011-05-01

Many modern enterprises are collecting data at the most detailed level possible, creating repositories ranging from terabytes to petabytes in size. The ability apply sophisticated statistical analysis methods this is becoming essential for marketplace competitiveness. This need perform deep over huge poses a significant challenge existing software and management systems. On one hand, provides rich functionality modeling, but can handle only limited amounts of data; e.g., popular packages...

10.1145/1807167.1807275 article EN 2010-06-06

Over the last couple of years, "Cloud Computing" or "Elastic has emerged as a compelling and successful paradigm for internet scale computing. One major contributing factors to this success is elasticity resources. In spite provided by infrastructure scalable design applications, elephant (or underlying database), which drives most these web-based not very elastic scalable, hence limits scalability. paper, we propose ElasTraS addresses issue scalability data store in cloud computing...

10.48550/arxiv.1008.3751 preprint EN other-oa arXiv (Cornell University) 2010-01-01

A database management system (DBMS) serving a cloud platform must handle large numbers of application databases (or tenants ) that are characterized by diverse schemas, varying footprints, and unpredictable load patterns. Scaling out using clusters commodity servers sharing resources among (i.e., multitenancy important features such systems. Moreover, when deployed on pay-per-use infrastructure, minimizing the system's operating cost while ensuring good performance is also an goal....

10.1145/2445583.2445588 article EN ACM Transactions on Database Systems 2013-04-01

State-of-the-art index tuners rely on query optimizer's cost estimates to search for the configuration with largest estimated execution improvement`. Due well-known limitations in estimates, a significant fraction of cases, an improve query's cost, e.g., CPU time, makes that worse when implemented. Such errors are major impediment automated indexing production systems. We observe comparing two plans same corresponding different configurations is key step during tuning. Instead using such...

10.1145/3299869.3324957 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Cloud computing is an extremely successful paradigm of service oriented and has revolutionized the way infrastructure abstracted used. Three most popular cloud paradigms include: Infrastructure as a Service (IaaS), Platform (PaaS), Software (SaaS). The concept however can also be extended to Database many more. Elasticity, pay-per-use, low upfront investment, time market , transfer risks are some major enabling features that make ubiquitous for deploying novel applications which were not...

10.14778/1920841.1921063 article EN Proceedings of the VLDB Endowment 2010-09-01

The increasing popularity of social networks has initiated a fertile research area in information extraction and data mining. Although such analysis can facilitate better understanding sociological, behavioral, other interesting phenomena, there is growing concern about personal privacy being breached, thereby requiring effective anonymization techniques. In this paper, we consider edge weight graphs. Our approach builds linear programming (LP) model which preserves properties the graph that...

10.1109/icde.2010.5447915 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2010-03-01

Memory is a crucial resource in relational databases (RDBMSs). When there insufficient memory, RDBMSs are forced to use slower media such as SSDs or HDDs, which can significantly degrade workload performance. Cloud database services deployed data centers where network adapters supporting remote direct memory access (RDMA) at low latency and high bandwidth becoming prevalent. We study the novel problem of how Symmetric Multi-Processing (SMP) RDBMS, whose demands exceed locally-available...

10.1145/2882903.2882949 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-14

An appropriate set of indexes can result in orders magnitude better query performance. Index management is a challenging task even for expert human administrators. Fully automating this process significant value. We describe the challenges, architecture, design choices, implementation, and learnings from building an industrial-strength auto-indexing service Microsoft Azure SQL Database, relational database service. Our has been generally available more than two years, generating index...

10.1145/3299869.3314035 article EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, many production deployments, ML models' degrades significantly when test data diverges from used train these models. In this paper, we address degradation B-instances collect additional during deployment. We propose an active collection platform, ADCP, that employs (AL) gather relevant...

10.1145/3318464.3389768 article EN 2020-05-29

Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees fixed amount resources. Tenants are responsible for selecting size suitable their workloads, which they can change to leverage cloud's elasticity. However, automating this task is daunting most tenants since estimating demands arbitrary SQL workloads in an RDBMS complex and challenging. In addition, requirements vary significantly within minutes hours, sizes by orders...

10.1145/2882903.2903733 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

A multitenant database management system (DBMS) in the cloud must continuously monitor trade-off between efficient resource sharing among multiple application databases (tenants) and their performance. Considering scale of \attn{hundreds to} thousands tenants such DBMSs, manual approaches for continuous monitoring are not tenable. self-managing controller a DBMS faces several challenges. For instance, how to characterize tenant given its variety workloads, reduce impact colocation, detect...

10.1145/2463676.2465308 article EN 2013-06-22

There has been a resurgence of work on replicated, distributed database systems to meet the demands intermittently-connected clients and disaster-tolerant databases that span data centers. Many weaken criteria for replica-consistency or isolation, in some cases add new mechanisms, improve partition-tolerance, availability, performance. We present framework comparing these help architects navigate through this complex design space.

10.1145/2463676.2465339 article EN 2013-06-22

<h3>Objectives</h3> To evaluate the efficacy and safety of two different targeted approaches—abatacept or tocilizumab—after rituximab therapy in rheumatoid arthritis, to explain observed difference using blood synovial studies interleukin 6 (IL-6) B cells patients receiving therapy. <h3>Methods</h3> Consecutive series who had discontinued owing inefficacy toxicity were treated with abatacept (n=16) tocilizumab (n=35). Clinical response reasons for discontinuation evaluated. Serial samples...

10.1136/annrheumdis-2013-204417 article EN Annals of the Rheumatic Diseases 2014-01-02

Many real-world data stream analysis applications such as network monitoring, click , and others require combining multiple streams of arriving from sources. This is referred to multi-stream . To deal with high arrival rates, it desirable that systems be capable supporting very processing throughput. The advent multicore processors powerful servers driven by these calls for efficient parallel designs can effectively utilize the parallelism multicores, since performance improvement possible...

10.14778/1687627.1687653 article EN Proceedings of the VLDB Endowment 2009-08-01
Coming Soon ...