Renyuan Xu

ORCID: 0000-0003-4293-3450
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Stochastic processes and financial applications
  • Reinforcement Learning in Robotics
  • Advanced Bandit Algorithms Research
  • Economic theories and models
  • Adaptive Dynamic Programming Control
  • Advanced Control Systems Optimization
  • Complex Systems and Time Series Analysis
  • Auction Theory and Applications
  • Stock Market Forecasting Methods
  • Risk and Portfolio Optimization
  • Financial Markets and Investment Strategies
  • Model Reduction and Neural Networks
  • Credit Risk and Financial Regulations
  • Smart Grid Energy Management
  • Experimental Behavioral Economics Studies
  • Game Theory and Voting Systems
  • Optimization and Search Problems
  • Financial Risk and Volatility Modeling
  • Climate Change Policy and Economics
  • Neural Networks and Applications
  • Distributed Control Multi-Agent Systems
  • Monetary Policy and Economic Impact
  • Topic Modeling
  • Stochastic Gradient Optimization Techniques
  • Cognitive Radio Networks and Spectrum Sensing

Fujian Normal University
2025

Southwest University
2024-2025

Renmin University of China
2025

New York University
2020-2025

University of Southern California
2019-2024

Inner Mongolia University
2024

University of Oxford
2019-2023

Southern California University for Professional Studies
2022

University of California, Berkeley
2018-2020

JPMorgan Chase & Co (United States)
2019

Abstract The rapid changes in the finance industry due to increasing amount of data have revolutionized techniques on processing and analysis brought new theoretical computational challenges. In contrast classical stochastic control theory other analytical approaches for solving financial decision‐making problems that heavily reply model assumptions, developments from reinforcement learning (RL) are able make full use large with fewer assumptions improve decisions complex environments. This...

10.1111/mafi.12382 article EN cc-by Mathematical Finance 2023-04-07

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over LLMs. We delve into study laws and present our distinctive findings that facilitate scale two commonly used configurations, 7B 67B. Guided by laws, we introduce DeepSeek LLM, project dedicated to advancing with long-term perspective. To support pre-training phase, have developed...

10.48550/arxiv.2401.02954 preprint EN other-oa arXiv (Cornell University) 2024-01-01

The rapid changes in the finance industry due to increasing amount of data has revolutionized techniques on processing and analysis brought new theoretical computational challenges. In contrast classical stochastic control theory other analytical approaches for solving financial decision-making problems that heavily reply model assumptions, developments from reinforcement learning (RL) are able make full use large with fewer assumptions improve decisions complex en- environments. This survey...

10.2139/ssrn.3971071 article EN SSRN Electronic Journal 2021-01-01

Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 24 November 2020Accepted: 15 June 2021Published online: 28 September 2021Keywordslinear quadratic regulator, reinforcement learning, policy gradient method, stochastic control, optimal liquidation, executionAMS Subject Headings68Q25, 68R10, 68U05Publication DataISSN (print): 0363-0129ISSN (online): 1095-7138Publisher: Society for Industrial and Applied MathematicsCODEN: sjcodc

10.1137/20m1382386 article EN SIAM Journal on Control and Optimization 2021-01-01

Entropy regularization has been extensively adopted to improve the efficiency, stability, and convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively qualitatively impact entropy for mean field games (MFGs) with learning a finite time horizon. Our study provides theoretical justification that yields time-dependent policies and, furthermore, helps stabilizing accelerating game equilibrium. In addition, this leads policy-gradient algorithm exploration MFG....

10.1287/moor.2021.1238 article EN Mathematics of Operations Research 2022-02-25

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with large population. It first establishes the existence of unique Nash Equilibrium to this GMFG, explains that naively combining Q-learning fixed-point approach classical MFGs yields unstable algorithms. then proposes algorithm Boltzmann policy (GMF-Q), analysis convergence property computational complexity. The experiments on repeated Ad auction problems...

10.48550/arxiv.1901.09585 preprint EN other-oa arXiv (Cornell University) 2019-01-01

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision making in stochastic games with large population. It first establishes the existence of unique Nash equilibrium to this GMFG, it demonstrates that naively combining reinforcement fixed-point approach classical yields unstable algorithms. then proposes value-based policy-based algorithms (GMF-V GMF-P, respectively) smoothed policies, analysis their convergence properties computational...

10.1287/moor.2022.1274 article EN Mathematics of Operations Research 2022-06-21

In this paper we formulate and analyze an $N$-player stochastic game of the classical fuel follower problem its mean field (MFG) counterpart. For game, obtain Nash equilibrium (NE) explicitly by deriving analyzing a system Hamilton--Jacobi--Bellman equations establishing existence unique strong solution to associated Skorokhod on unbounded polyhedron with oblique reflection. MFG, derive bang-bang type NE under some mild technical conditions viscosity approach. We also show that is...

10.1137/17m1159531 article EN SIAM Journal on Control and Optimization 2019-01-01

Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 15 October 2020Accepted: 16 August 2021Published online: 28 2021Keywordsmean-field control, multi-agent reinforcement learning, Q-learning, cooperative games, dynamic programming principleAMS Subject Headings49N80, 68Q32, 68T05, 90C40Publication DataISSN (online): 2577-0187Publisher: Society for Industrial and Applied MathematicsCODEN: sjmdaq

10.1137/20m1360700 article EN SIAM Journal on Mathematics of Data Science 2021-01-01

One of the challenges for multiagent reinforcement learning (MARL) is designing efficient algorithms a large system in which each agent has only limited or partial information entire system. Whereas exciting progress been made to analyze decentralized MARL with network agents social networks and team video games, little known theoretically states modeling self-driving vehicles, ride-sharing, data traffic routing. This paper proposes framework localized training execution study states....

10.1287/moor.2022.0055 article EN Mathematics of Operations Research 2024-03-13

Unintentional man-made disasters, such as the Chernobyl disaster, constitute majority of disasters. Despite their prevalence, there is a lack systematic analysis socioeconomic consequences. Utilizing unique dataset on extremely severe accidents (ESAs) in China and nationally representative longitudinal household surveys, we find that unintentional disasters significantly negatively impact individual trust government risk-taking attitudes. However, ESAs do not affect neighbors. The severity...

10.2139/ssrn.5037749 preprint EN 2025-01-01

This phenomenological study delves into the learning experiences of Chinese undergraduate students enrolled in a transnational International Economics and Trade program, which is collaboratively run by Australian universities. Through in-depth semi-structured interviews with 23 from various years study, research aims to uncover how these navigate derive meaning their within cross-cultural educational setting. Data analysis reveals three prominent themes: differences course design for...

10.12709/mest.13.13.01.12 article EN cc-by MEST Journal 2025-01-14

10.1137/23m1581881 article EN SIAM Journal on Financial Mathematics 2025-06-02

In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate top-$K$ out $N$ experts, face challenges in ensuring expert specialization, i.e. each acquires non-overlapping and focused knowledge. response, we propose DeepSeekMoE towards ultimate specialization. It involves two principal strategies: (1) finely segmenting experts...

10.48550/arxiv.2401.06066 preprint EN other-oa arXiv (Cornell University) 2024-01-01

Multiagent systems—such as recommendation systems, ride-sharing platforms, food-delivery and data-routing centers—are areas of rapid technology development that require constant improvements to address the lack efficiency curse dimensionality. In paper “Dynamic Programming Principles for Mean-Field Controls with Learning,” we show multiagent systems mean-field approximation learning can be recast general forms reinforcement problems, where state variable is replaced by probability...

10.1287/opre.2022.2395 article EN Operations Research 2023-01-12

Dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), more recently mean-field controls (MFCs). However, in the framework of MFCs, DPP has not been rigorously established, despite its critical importance algorithm designs. In this paper, we first present a simple example MFCs with where fails mis-specified Q function; then propose correct form function an appropriate space learning. This...

10.48550/arxiv.1911.07314 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We use a spatial epidemic model with demographic and geographic heterogeneity to study the regional dynamics of COVID-19 across 133 regions in England.Our emphasises role variability outcomes age groups locations, provides framework for assessing impact policies targeted towards sub-populations or regions. define concept efficiency comparative analysis control show mitigation based on local monitoring be more efficient than country-level non-targeted measures. In particular, our results...

10.2139/ssrn.3681507 article EN SSRN Electronic Journal 2020-01-01

Entropy regularization has been extensively adopted to improve the efficiency, stability, and convergence of algorithms in reinforcement learning. This paper analyzes both quantitatively qualitatively impact entropy for Mean Field Game (MFG) with learning a finite time horizon. Our study provides theoretical justification that yields time-dependent policies and, furthermore, helps stabilizing accelerating game equilibrium. In addition, this leads policy-gradient algorithm exploration MFG....

10.48550/arxiv.2010.00145 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Abstract We analyze a class of stochastic differential games singular control, motivated by the study dynamic model interbank lending with benchmark rates. describe Pareto optima for this game and show how they may be achieved through intervention regulator, whose policy is solution to control problem. are characterized in terms solutions new Skorokhod problems piecewise‐continuous free boundary. optimal policies shown correspond enforcement endogenous bounds on Analytical comparison between...

10.1111/mafi.12325 article EN Mathematical Finance 2021-07-13

In this paper we propose and analyze a class of $N$-player stochastic games that include finite fuel as special case. We first derive sufficient conditions for the Nash equilibrium (NE) in form verification theorem. The associated quasi-variational-inequalities an essential game component regarding interactions among players, which may be interpreted analytical representation conditional optimality NEs. derivation NEs involves solving multidimensional free boundary problem then Skorokhod...

10.1137/20m1322558 article EN SIAM Journal on Control and Optimization 2022-03-17

The estimation of loss distributions for dynamic portfolios requires the simulation scenarios representing realistic joint dynamics their components, with particular importance devoted to tail risk scenarios. We propose a novel data-driven approach that utilizes Generative Adversarial Network (GAN) architecture and exploits elicitability property Value-at-Risk (VaR) Expected Shortfall (ES). Our proposed is capable learning simulate price preserve features benchmark trading strategies,...

10.48550/arxiv.2203.01664 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01
Coming Soon ...