Zijun Yao

ORCID: 0000-0003-3647-8770
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Healthcare
  • Topic Modeling
  • Human Mobility and Location-Based Analysis
  • Recommender Systems and Techniques
  • Artificial Intelligence in Healthcare
  • Biomedical Text Mining and Ontologies
  • Housing Market and Economics
  • Data Management and Algorithms
  • Traffic Prediction and Management Techniques
  • Data Quality and Management
  • Data-Driven Disease Surveillance
  • Musculoskeletal pain and rehabilitation
  • Artificial Intelligence in Healthcare and Education
  • AI in cancer detection
  • Text Readability and Simplification
  • Intelligent Tutoring Systems and Adaptive Learning
  • Numerical Methods and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Machine Learning and Data Classification
  • Data Stream Mining Techniques
  • Adversarial Robustness in Machine Learning
  • Advanced Graph Neural Networks
  • Mathematics Education and Teaching Techniques
  • Language and cultural evolution
  • Generative Adversarial Networks and Image Synthesis

University of Kansas
2021-2025

University of Florida
2024

First Affiliated Hospital of Xi'an Jiaotong University
2023

IBM (United States)
2019-2021

IBM Research - Thomas J. Watson Research Center
2019

Rutgers Sexual and Reproductive Health and Rights
2014-2018

Rutgers, The State University of New Jersey
2013-2018

Rütgers (Germany)
2014

Sun Yat-sen University
2012

The problem of point interest (POI) recommendation is to provide personalized recommendations places interests, such as restaurants, for mobile users. Due its complexity and connection location based social networks (LBSNs), the decision process a user choose POI complex can be influenced by various factors, preferences, geographical influences, mobility behaviors. While there are some studies on recommendations, it lacks integrated analysis joint effect multiple factors. To this end, in...

10.1145/2487575.2487673 article EN 2013-08-11

Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct human language evolution. By studying word evolution, we can infer social trends constructs over different periods history. However, traditional techniques such representation learning do not adequately capture evolving structure vocabulary. In this paper, develop dynamic statistical model learn time-aware vector representation. We propose that simultaneously learns embeddings solves...

10.1145/3159652.3159703 preprint EN 2018-02-02

The problem of point interest (POI) recommendation is to provide personalized recommendations places, such as restaurants and movie theaters. increasing prevalence mobile devices location based social networks (LBSNs) poses significant new opportunities well challenges, which we address. decision process for a user choose POI complex can be influenced by numerous factors, personal preferences, geographical considerations, mobility behaviors. This further complicated the connection LBSNs...

10.1109/tkde.2014.2362525 article EN IEEE Transactions on Knowledge and Data Engineering 2014-10-09

Point of interest (POI) recommendation, which provides personalized recommendation places to mobile users, is an important task in location-based social networks (LBSNs). However, quite different from traditional interest-oriented merchandise POI more complex due the timing effects: we need examine whether fits a user's availability. While there are some prior studies included temporal effect into recommendations, they overlooked compatibility between time-varying popularity POIs and regular...

10.1109/icdm.2016.0066 article EN 2016-12-01

Urban functions refer to the purposes of land use in cities where each zone plays a distinct role and cooperates with other serve people’s various life needs. Understanding helps solve variety urban related problems, such as increasing traffic capacity enhancing location-based service. Therefore, it is beneficial investigate how learn representations city zones terms functions, for better supporting analytic applications. To this end, paper, we propose framework vector representation...

10.24963/ijcai.2018/545 article EN 2018-07-01

It is traditionally a challenge for home buyers to understand, compare and contrast the investment values of real estates. While number estate appraisal methods have been developed value property, performances these limited by traditional data sources appraisal. However, with development new ways collecting estate-related mobile data, there potential leverage geographic dependencies estates enhancing Indeed, an can be from characteristics its own neighborhood (individual), nearby (peer),...

10.1145/2623330.2623675 article EN 2014-08-22

Ranking residential real estates based on investment values can provide decision making support for home buyers and thus plays an important role in estate marketplace. In this paper, we aim to develop methods ranking by mining users' opinions about from online user reviews offline moving behaviors (e.g., Taxi traces, smart card transactions, check-ins). While a variety of features could be extracted these data, are Interco related redundant. Thus, selecting good integrating the feature...

10.1109/icdm.2014.18 article EN 2014-12-01

If properly analyzed, the multi-aspect rating data could be a source of rich intelligence for providing personalized restaurant recommendations. Indeed, while recommender systems have been studied various applications and many recommendation techniques developed general or specific tasks, there are few studies by addressing unique challenges reviews. As we know, traditional collaborative filtering methods typically single aspect ratings. However, ratings often collected from customers. These...

10.1137/1.9781611973440.54 article EN 2014-04-28

With ChatGPT under the spotlight, utilizing large language models (LLMs) to assist academic writing has drawn a significant amount of debate in community. In this paper, we aim present comprehensive study detectability ChatGPT-generated content within literature, particularly focusing on abstracts scientific papers, offer holistic support for future development LLM applications and policies academia. Specifically, first GPABench2, benchmarking dataset over 2.8 million comparative samples...

10.48550/arxiv.2306.05524 preprint EN cc-by arXiv (Cornell University) 2023-01-01

For care of chronic diseases (e.g., depression, diabetes, hypertension), it is critical to identify effective treatment pathways that aim promptly update the medication following change patient state and disease progression. This task challenging because optimal pathway for each needs be personalized due significant heterogeneity among individuals. Therefore, naturally promising investigate how use abundant electronic health records recommend safe prescriptions. However, prescription...

10.1145/3579994 article EN ACM transactions on office information systems 2023-01-12

We proposed an Interpretable Personalized Artificial Intelligence (AI) model for PRO measures via Recurrent Neural Networks (RNN) and attention scores, with data from open label randomized clinical trial of pain in 402 participants cryptogenic sensory polyneuropathy at 40 neurology care clinics. All patients were assigned to four treatment groups: nortriptyline, duloxetine, pregabalin, mexiletine. Each patient had 4 (quality life SF-12; PROMIS: interference, fatigue, sleep disturbance) time...

10.1080/10543406.2025.2469884 article EN Journal of Biopharmaceutical Statistics 2025-03-13

Best-of-N (BoN) sampling, a common strategy for test-time scaling of Large Language Models (LLMs), relies on reward models to select the best candidate solution from multiple generations. However, traditional often assign arbitrary and inconsistent scores, limiting their effectiveness. To address this, we propose Pairwise Reward Model (Pairwise RM) combined with knockout tournament BoN sampling. Instead assigning absolute given one math problem, RM evaluates two solutions' correctness...

10.48550/arxiv.2501.13007 preprint EN arXiv (Cornell University) 2025-01-22

Educator preparation, personalized learning (PL) implementation, and applications of Generative AI converge as three interrelated systems that, when carefully designed, can help achieve the long-sought goal providing inclusive education for all learners. However, realizing this potential comes with challenges resulting from theoretical complexities technological constraints. This article provides a analysis complex interconnectedness among these guided by Cultural-Historical Activity Theory...

10.1177/00224871251325109 article EN Journal of Teacher Education 2025-03-19

Spatial co-location patterns are subsets of spatial features usually located together in geographic space. Recent literature has provided different approaches to discover over point data. However, most consider the neighborhood relationship among objects as binary and mainly designed for features, thus not appropriate extended such line strings polygons, which is naturally continuous. This paper adopts a buffer-based model measuring mining patterns. While several advantages it involves high...

10.1109/tkde.2019.2930598 article EN publisher-specific-oa IEEE Transactions on Knowledge and Data Engineering 2019-07-23

It is traditionally a challenge for home buyers to understand, compare, and contrast the investment value of real estate. Although number appraisal methods have been developed properties, performances these limited by traditional data sources estate appraisal. With development new ways collecting estate-related mobile data, there potential leverage geographic dependencies enhancing Indeed, an can be from characteristics its own neighborhood (individual), values nearby estates (peer),...

10.1145/2934692 article EN ACM Transactions on Knowledge Discovery from Data 2016-08-27

With ChatGPT under the spotlight, utilizing large language models (LLMs) to assist academic writing has drawn a significant amount of debate in community. In this paper, we aim present comprehensive study detectability ChatGPT-generated content within literature, particularly focusing on abstracts scientific papers, offer holistic support for future development LLM applications and policies academia. Specifically, first GPABench2, benchmarking dataset over 2.8 million comparative samples...

10.1145/3658644.3670392 article EN cc-by 2024-12-02

Missing data points are prevalent in electronic health records (EHRs) and an impedance to utilizing machine learning for predictive classification tasks healthcare. For this challenge, we developed eXITs - a stacked ensemble learner that employs 6 base models perform imputation on time series from 13 different laboratory tests across 8, 267 patients the MIMIC-III database provided ICHI 2019 Data Analytics Challenge Imputation (DACMI). The results show our model (avg. nRMSE = 0.200)...

10.1109/ichi.2019.8904779 article EN 2019-06-01

Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging significance of digit order computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting a conclusion where obtaining better performance involves fine-grained step-by-step. Diverging from this conventional path, our work introduces...

10.48550/arxiv.2403.05845 preprint EN arXiv (Cornell University) 2024-03-09

Recent advancements in sequential modeling applied to Electronic Health Records (EHR) have greatly influenced prescription recommender systems.While the recent literature on drug recommendation has shown promising performance, study of discovering a diversity coexisting temporal relationships at level medical codes over consecutive visits remains less explored.The goal this can be motivated from two perspectives.First, there is need develop sophisticated model capable disentangling complex...

10.1145/3627673.3679836 preprint EN 2024-10-20

Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for events of interest can support an informative outlook patient's medical journey. Given existence data censoring, effective way survival is to enforce pairwise temporal concordance between censored and observed data, aiming utilize time interval before censoring as partially time-to-event labels supervised learning. Although existing studies mostly employed ranking methods pursue ordering...

10.1145/3583780.3614824 article EN cc-by 2023-10-21

Image inpainting is the process of filling in missing parts damaged images based on information gleaned from surrounding areas. In this paper, we present two variational models for image inpainting. Combining models, can simultaneously fill missing, corrupted or undesirable information, while remove noise. We explain that diffusion performance proposed essentially superior to TV model by analysing physical characteristics local coordinates, and investigate existence minimising functionals BV...

10.1179/1743131x11y.0000000055 article EN The Imaging Science Journal 2012-03-08

Previous chapter Next Full AccessProceedings Proceedings of the 2016 SIAM International Conference on Data Mining (SDM)The Impact Community Safety House RankingZijun Yao, Yanjie Fu, Bin Liu, and Hui XiongZijun Xiongpp.459 - 467Chapter DOI:https://doi.org/10.1137/1.9781611974348.52PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract It is well recognized that community safety which affects people's right live without fear crime has considerable...

10.1137/1.9781611974348.52 article EN 2016-06-30

In online advertising, it is critical for advertisers to forecast conversion rate (CVR) of campaigns. Previous work on campaign forecasting concentrates the time-series analysis which depend availability a length history. However, these approaches become inadequate cold-start campaigns lack observation past. this work, we attempt mitigate challenge by learning an unsupervised and composite embedding capture multi-view semantic relationships information, consequently using nearest neighbor...

10.1109/tbdata.2022.3162150 article EN IEEE Transactions on Big Data 2022-03-24

Word embedding aims to learn the dense representation of words and has become a regular input preparation in many NLP tasks. Due data computation intensive nature learning embeddings from scratch, more affordable way is borrow pretrained available public fine-tune through domain specific downstream dataset. A privacy concern can arise if malicious owner gets access fine-tuned tries infer critical information datasets. In this study, we propose novel inversion framework called Invernet that...

10.18653/v1/2022.findings-emnlp.368 article EN cc-by 2022-01-01

<b><i>Introduction:</i></b> Calcific aortic valve disease (CAVD) is the third most common cardiovascular in aging populations. Despite a growing number of biomarkers having been shown to be associated with CAVD, marker suitable for routine testing clinical practice still needed. Plasma cell-free DNA (cfDNA) has suggested as biomarker diagnosis and prognosis multiple diseases. In this study, we aimed test whether cfDNA could used CAVD....

10.1159/000534229 article EN cc-by Cardiology 2023-10-27
Coming Soon ...