Alexander Rasin

ORCID: 0000-0001-7282-5763
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Digital and Cyber Forensics
  • Advanced Malware Detection Techniques
  • Advanced Data Storage Technologies
  • Data Quality and Management
  • AI in cancer detection
  • Semantic Web and Ontologies
  • Biomedical Text Mining and Ontologies
  • Cloud Data Security Solutions
  • Cloud Computing and Resource Management
  • Privacy-Preserving Technologies in Data
  • Cryptography and Data Security
  • Radiomics and Machine Learning in Medical Imaging
  • Topic Modeling
  • Security and Verification in Computing
  • Radiology practices and education
  • Imbalanced Data Classification Techniques
  • Peer-to-Peer Network Technologies
  • Software Engineering Research
  • Web Data Mining and Analysis
  • Network Security and Intrusion Detection
  • Scientific Computing and Data Management
  • Data Mining Algorithms and Applications
  • Software System Performance and Reliability

DePaul University
2015-2024

Povolzhsky Research Institute of Production and Processing of Meat and Dairy Products
2023

Brown University
2005-2016

John Brown University
2003-2010

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although basic control flow of this framework has existed in parallel SQL database management systems (DBMS) over 20 years, some have called MR a dramatically new computing model [8, 17]. In paper, we describe and compare both paradigms. Furthermore, evaluate kinds terms performance development complexity. To end, define benchmark consisting collection tasks that run on an open...

10.1145/1559845.1559865 article EN 2009-06-29

The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often virtualized inside public or private "clouds". At the same time, amount of that needs to be analyzed exploding, requiring hundreds thousands machines work parallel perform analysis....

10.14778/1687627.1687731 article EN Proceedings of the VLDB Endowment 2009-08-01

Our previous work has shown that architectural and application shifts have resulted in modern OLTP databases increasingly falling short of optimal performance [10]. In particular, the availability multiple-cores, abundance main memory, lack user stalls, dominant use stored procedures are factors portend a clean-slate redesign RDBMSs. This showed such potential to outperform legacy by significant factor. These results, however, were obtained using bare-bones prototype was developed just...

10.14778/1454159.1454211 article EN Proceedings of the VLDB Endowment 2008-08-01

MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a specialty.

10.1145/1629175.1629197 article EN Communications of the ACM 2009-12-21

Stream-processing systems are designed to support an emerging class of applications that require sophisticated and timely processing high-volume data streams, often originating in distributed environments. Unlike traditional data-processing precise recovery for correctness, many stream-processing can tolerate benefit from weaker guarantees. In this paper, we study various guarantees pertinent techniques meet the correctness performance requirements applications. We discuss design algorithmic...

10.1109/icde.2005.72 article EN 2005-04-19

Borealis is a distributed stream processing engine that being developed at Brandeis University, Brown and MIT. inherits core functionality from Aurora inter-node communication Medusa.We propose to demonstrate some of the key aspects operation in Borealis, using multi-player network game as underlying application. The demonstration will illustrate dynamic resource management, query optimization high availability mechanisms employed by visual performance-monitoring tools well gaming experience.

10.1145/1066157.1066274 article EN 2005-06-14

We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select appropriate set MVs indexes for given workload, none this has explored the effect correlated (e.g., encoding geographic information) on designs. Our identifies secondary such clustered are enhanced, which can dramatically improve query performance. It uses form Integer...

10.14778/1920841.1920979 article EN Proceedings of the VLDB Endowment 2010-09-01

Forensic tools assist analysts with recovery of both the data and system events, even from corrupted storage. These typically rely on "file carving" techniques to restore files after metadata loss by analyzing remaining raw file content. A significant amount sensitive is stored processed in relational databases thus creating need for database forensic that will extend carving solutions realm. Raw storage partitioned into individual "pages" cannot be read or presented analyst without help...

10.1016/j.diin.2015.05.013 article EN cc-by-nc-nd Digital Investigation 2015-08-01

Background College can be stressful for many freshmen as they cope with a variety of stressors. Excess stress negatively affect both psychological and physical health. Thus, there is need to find innovative cost-effective strategies help identify students experiencing high levels receive appropriate treatment. Social media use has been rapidly growing, recent studies have reported that data from these technologies used public health surveillance. Currently, no examined whether Twitter...

10.2196/mental.5626 article EN cc-by JMIR Mental Health 2017-01-10

In relational query processing, there are generally two choices for access paths when performing a predicate lookup which no clustered index is available. One option to use an unclustered index. Another perform complete sequential scan of the table. Many analytical workloads do not benefit from availability indexes; cost random disk I/O becomes prohibitive all but most selective queries. It has been observed that secondary on attribute can well under certain conditions if correlated with...

10.14778/1687627.1687765 article EN Proceedings of the VLDB Endowment 2009-08-01

Database Management Systems (DBMS) are routinely used to store and process sensitive enterprise data. However, it is not possible secure data by relying on the access control security mechanisms (e.g., audit logs) of such systems alone – users may abuse their privileges (no matter whether granted or gained illegally) circumvent maliciously alter Thus, in addition taking preventive measures, major goal database 1) detect breaches 2) gather evidence about attacks for devising counter measures....

10.1016/j.diin.2017.06.006 article EN cc-by-nc-nd Digital Investigation 2017-08-01

Software project artifacts such as source code, requirements, and change logs represent a gold-mine of actionable information. As result, software analytic solutions have been developed to mine repositories answer questions "who is the expert?," "which classes are fault prone?," or even domain experts for these fault-prone classes?" Analytics often require training configuring in order maximize performance within context each project. A cold-start problem exists when function applied without...

10.1145/2901739.2901740 article EN 2016-05-14

Good database design is typically a very difficult and costly process. As systems get more complex as the amount of data under management grows, stakes increase accordingly. Past research produced number tools capable automatically selecting secondary indexes materialized views for known workload. However, significant bulk on automated has been done in context row-store DBMSes. While this work effective tools, new specialized architectures demand rethinking algorithms.

10.1145/2452376.2452402 article EN 2013-03-18

When a file is deleted, the storage it occupies de-allocated but contents of are not erased. An extensive selection carving tools and techniques available to forensic analysts – yet existing cannot recover database because all engines use proprietary unique format. Database systems widely used store process data both on large scale (e.g., enterprise networks) for personal SQLite in mobile devices or Firefox). For some databases, users can purchase specialized recovery capable discovering...

10.1016/j.diin.2016.04.015 article EN cc-by-nc-nd Digital Investigation 2016-08-01

Today's digitized world urgently needs Big Data integration and analysis. Healthcare records are responsible for generating petabytes of data in a single day. Such is heterogeneous nature, captured different files formats, varies from hospital to hospital. By integrating sources extracting meaningful information the medical community, we can improve overall quality patient care. Our research targets problem health records. To start, already developed Integrated Radiology Image search (IRIS)...

10.1109/lsc.2018.8572185 article EN 2018-10-01

Data privacy requirements are a complex and quickly evolving part of the data management domain. Especially in Healthcare (e.g., United States Health Insurance Portability Accountability Act Veterans Affairs requirements), there has been strong emphasis on protection. storage is governed by multiple sources policy requirements, including internal policies legal imposed external governing organizations. Within database, single value can be subject to how long it must preserved when...

10.1145/3538712.3538718 article EN 2022-07-06

Vast amounts of clinical and biomedical research data are produced daily. These can help enable driven healthcare through novel discoveries, improved diagnostics processes, epidemiology, education. However, finding, gaining access to these relevant metadata that necessary achieve goals remains a challenge. Furthermore, management enabling widespread, albeit controlled, use poses major challenge for producers. sources often geographically distributed, with diverse characteristics, controlled...

10.3390/data4020054 article EN cc-by Data 2019-04-20

Software projects produce large quantities of data such as feature requests, requirements, design artifacts, source code, tests, safety cases, release plans, and bug reports. If leveraged effectively, this can be used to provide project intelligence that supports diverse software engineering activities planning, impact analysis, analytics. However, stakeholders often lack skills formulate complex queries needed retrieve, manipulate, display the in meaningful ways. To address these challenges...

10.1109/ase.2017.8115714 article EN 2017-10-01
Coming Soon ...