NFDI4DS | UHH-SEMS - Publication Details

ClaimBuster

OPENALEX - Publications

Naeemul Hassan Gensheng Zhang Fatma Arslan Josue Caraballo Damian Jimenez and 8 more

Our society is struggling with an unprecedented amount of falsehoods, hyperboles, and half-truths. Politicians organizations repeatedly make the same false claims. Fake news floods cyberspace even allegedly influenced 2016 election. In fighting information, number active fact-checking has grown from 44 in 2014 to 114 early 2017. 1 Fact-checkers vet claims by investigating relevant data documents publish their verdicts. For instance, PolitiFact.com, one earliest most popular projects, gives...

10.14778/3137765.3137815 article EN Proceedings of the VLDB Endowment 2017-08-01

Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries

OPENALEX - Publications

Shohedul Hasan Saravanan Thirumuruganathan Jees Augustine Nick Koudas Gautam Das

Selectivity estimation - the problem of estimating result size queries is a fundamental in databases. Accurate query selectivity involving multiple correlated attributes especially challenging. Poor cardinality estimates could selection bad plans by optimizer. Recently, deep learning has been applied to this with promising results. However, many proposed approaches often struggle provide accurate results for multi attribute large number predicates and low selectivity. In paper, we propose...

10.1145/3318464.3389741 article EN 2020-05-29

Approximate Query Processing for Data Exploration using Deep Generative Models

OPENALEX - Publications

Saravanan Thirumuruganathan Shohedul Hasan Nick Koudas Gautam Das

Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed computing exact results. In this work, we explore the usage deep learning (DL) answering aggregate queries specifically interactive applications such as data exploration and visualization. We use generative models, unsupervised based approach, learn...

10.1109/icde48307.2020.00117 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2020-04-01

Knowledge Graph-Enabled Cancer Data Analytics

OPENALEX - Publications

Shohedul Hasan Donna R. Rivera Xiao‐Cheng Wu Eric B. Durbin J. Blair Christian and 1 more

Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding characteristics, treatments, outcomes. registry typically (1) categorize each reportable case or tumor at the time of diagnosis, (2) contain demographic information about patient such as age, gender, location (3) include planned completed primary treatment information, (4) may survival As is being extracted from various sources, pathology reports, radiology...

10.1109/jbhi.2020.2990797 article EN cc-by IEEE Journal of Biomedical and Health Informatics 2020-05-04

Multi-Attribute Selectivity Estimation Using Deep Learning

OPENALEX - Publications

Shohedul Hasan Saravanan Thirumuruganathan Jees Augustine Nick Koudas Gautam Das

Selectivity estimation - the problem of estimating result size queries is a fundamental in databases. Accurate query selectivity involving multiple correlated attributes especially challenging. Poor cardinality estimates could selection bad plans by optimizer. We investigate feasibility using deep learning based approaches for both point and range propose two complementary approaches. Our first approach considers as an unsupervised density problem. successfully introduce techniques from...

10.48550/arxiv.1903.09999 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Approximate Query Processing using Deep Generative Models

OPENALEX - Publications

Saravanan Thirumuruganathan Shohedul Hasan Nick Koudas Gautam Das

Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed computing exact results. In this work, we explore the usage deep learning (DL) answering aggregate queries specifically interactive applications such as data exploration and visualization. We use generative models, unsupervised based approach, learn...

10.48550/arxiv.1903.10000 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Proximity-based sentiment analysis

OPENALEX - Publications

Shohedul Hasan Donald Adjeroh

Sentiment analysis seeks to characterize opinionated or evaluative aspects of natural language text thus helping people discover valuable information from large amounts unstructured data [1]. In this paper we explore a new methodology for sentiment called proximity-based analysis. We take different approach, by considering set features based on word proximities in written text. propose three features, namely, proximity distribution, mutual between types, and patterns. applied approach the...

10.1109/icadiwt.2011.6041410 article EN 2011-08-01

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction

OPENALEX - Publications

Mohammed Alawad Shohedul Hasan J. Blair Christian Georgia D. Tourassi

Deep learning has surged in popularity and proven to be effective for various artificial intelligence applications including information extraction from cancer pathology reports. Since word representation is a core unit that enables deep algorithms understand words able perform NLP, this must include as much possible help these achieve high classification performance. Therefore, work addition the distributional of large sized corpora, we use UMLS vocabulary resources enrich vector space with...

10.1109/bigdata.2018.8621999 article EN 2021 IEEE International Conference on Big Data (Big Data) 2018-12-01

A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data

OPENALEX - Publications

Shohedul Hasan Donna R. Rivera Xiao‐Cheng Wu J. Blair Christian Georgia D. Tourassi

Population-based central cancer registries collect valuable structured and unstructured data primarily for surveillance reporting. The collected includes (1) categorization of each case (tumor) at the time diagnosis, (2) demographic information about patient such as age, gender, location (3) first course treatment information, (4) survival outcomes when available. While advanced analytical approaches SEER*Stat SAS exist, we provide a knowledge graph approach to organizing registry analytics...

10.1109/bhi.2019.8834538 article EN 2019-05-01

Integration of Domain Knowledge using Medical Knowledge Graph Deep Learning for Cancer Phenotyping

OPENALEX - Publications

Mohammed Alawad Shang Gao Mayanka Chandrashekar Shohedul Hasan James Blair Christian and 7 more

A key component of deep learning (DL) for natural language processing (NLP) is word embeddings. Word embeddings that effectively capture the meaning and context they represent can significantly improve performance downstream DL models various NLP tasks. Many existing techniques words based on co-occurrence in documents text; however, often cannot broader domain-specific relationships between concepts may be crucial task at hand. In this paper, we propose a method to integrate external...

10.48550/arxiv.2101.01337 preprint EN other-oa arXiv (Cornell University) 2021-01-01

An Alternative Timing and Synchronization Approach for Situational Awareness and Predictive Analytics

OPENALEX - Publications

Supriya Chinthavali Shohedul Hasan Srikanth B. Yoginath Haowen Xu Phil J. Nugent and 6 more

Accurate and synchronized timing information is required by power system operators for controlling the grid infrastructure (relays, Phasor Measurement Units (PMUs), etc.) determining asset positions. Satellite-based global positioning (GPS) primary source of information. However, GPS disruptions today (both intentional unintentional) can significantly compromise reliability security our electric grids. A robust alternate accurate critical to serve both as a deterrent against malicious...

10.1109/iri54793.2022.00047 article EN 2022-08-01

Co-design of Advanced Architectures for Graph Analytics using Machine Learning

OPENALEX - Publications

Kuldeep Kurte Neena Imam Ramakrishnan Kannan Shohedul Hasan Srikanth B. Yoginath

A graph is an excellent way of representing relationships among entities. We can use analytics to synthesize and analyze such relational data, extract relevant features that are useful for various tasks as machine learning. Considering the crucial role in domains, it important timely investigate right hardware configurations achieve optimal performance workloads on future high-performance computing systems. Design space exploration studies facilitate selection appropriate (e.g. memory) a...

10.1109/ipdpsw52791.2021.00053 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021-06-01

ApproxML

OPENALEX - Publications

Sona Hasani Faezeh Ghaderi Shohedul Hasan Saravanan Thirumuruganathan Abolfazl Asudeh and 2 more

Machine learning (ML) has gained a pivotal role in answering complex predictive analytic queries. Model building for large scale datasets is one of the time consuming parts data science pipeline. Often scientists are willing to sacrifice some accuracy order speed up this process during exploratory phase. In paper, we propose demonstrate ApproxML, system that efficiently constructs approximate ML models new queries from previously constructed using concepts model materialization and reuse ....

10.14778/3352063.3352096 article EN Proceedings of the VLDB Endowment 2019-08-01

A Scalable Graph Analytics Framework for Programming with Big Data in R (pbdR)

OPENALEX - Publications

Shohedul Hasan Drew Schmidt Ramakrishnan Kannan Neena Imam

Many disciplines such as biology, economics, engineering, physics, and the social sciences represent their data graphs to capture patterns, trends, associations. There are many commercially available graph libraries in different programming languages analyze these complex graphs. But there is no distributed library package R - popular statistical language that bigger than a single machine's memory. domain experts prefer over numerous other alternatives. Towards this, we present analytics...

10.1109/bigdata47090.2019.9006155 article EN 2021 IEEE International Conference on Big Data (Big Data) 2019-12-01

Design Space Exploration of Emerging Memory Technologies for Machine Learning Applications

OPENALEX - Publications

Shohedul Hasan Neena Imam Ramakrishnan Kannan Srikanth B. Yoginath Kuldeep Kurte

Memory design space exploration methods study memory systems' performances and limitations before implementation. The computer has grown exponentially because of the enormous growth types, controllers, application software. Computer simulators are commonly used for exploration. However, complex simulations take an amount time. Hence, in this paper, we proposed a machine learning-based method dynamic random-access non-volatile systems. We applied our to CosmoGAN LeNet applications predict...

10.1109/ipdpsw52791.2021.00075 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021-06-01

Efficient Strongly Polynomial Algorithms for Quantile Regression

OPENALEX - Publications

Suraj Shetiya Shohedul Hasan Abolfazl Asudeh Gautam Das

Linear Regression is a seminal technique in statistics and machine learning, where the objective to build linear predictive models between response (i.e., dependent) variable one or more predictor independent) variables. In this paper, we revisit classical of Quantile (QR), which statistically robust alternative other Ordinary Least Square (OLS). However, while there exist efficient algorithms for OLS, almost all known results QR are only weakly polynomial. Towards filling gap, paper...

10.48550/arxiv.2307.08706 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Medication Knowledge Graph Analysis Using the PageRank Algorithm

OPENALEX - Publications

Shohedul Hasan Alina Peluso Heidi A. Hanson Anuj J. Kapadia

Electronic health records (EHRs) data from the US Department of Veterans Affairs (VA) show that many veterans have been treated with Selective Serotonin Reuptake Inhibitors (SSRIs) class antidepressant medications. It is crucial to study medications in association other important clinical concepts make better medication prescribing decisions and conduct analyses side effects comorbidity. In this paper, we used PubMed's knowledge graph PageRank algorithm identify related three belong SSRI...

10.1109/ichi57859.2023.00082 article EN 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI) 2023-06-26

Application of Unified Medical Language System (UMLS) to Standardize Pediatric Drug Data

OPENALEX - Publications

Shohedul Hasan Greeshma Agasthya Daniel Santel Surbhi Bhatnagar Ian Goethert and 2 more

Pediatric Electronic Health Records (EHRs) contain drug/medication data. Despite the importance of standardizing drug data to identify class information and enable interoperability between computer systems, sometimes no biomedical vocabulary is used, therefore not standardized. This paper employed UMLS vocabularies standardize Cincinnati Children's Hospital Medical Center (CCHMC) EHR use it build models for pediatric mental health trajectories. We present an approach that identifies a...

10.1109/ichi57859.2023.00138 article EN 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI) 2023-06-26

A Knowledge Network-Based Approach to Facilitate Annotation of Clinical Pathway Component Clusters

OPENALEX - Publications

Shohedul Hasan Minsu Kim Byung H. Park Makoto Jones Merry Ward and 1 more

Mining electronic health records (EHRs) to identify contextually related clinical concept clusters that tend co-occur temporarily and consistently could improve data-driven pathway (CP) construction. However, the automatic extraction of contains a vast amount irrelevant information. Hence, this paper proposes knowledge network-enabled literature-based discovery (LBD) approach remove noise from clusters. The authors used published literature filter spurious concepts data US Department...

10.1109/bhi50953.2021.9508508 article EN IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI ...) 2021-07-27