Strategic bidding in freight transport using deep reinforcement learning

Deep reinforcement learning FOS: Computer and information sciences Computer Science - Machine Learning Self-organizing logistics UT-Hybrid-D 0211 other engineering and technologies Policy gradient 02 engineering and technology Strategic bidding Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2102.09253 Publication Date: 2022-02-22
ABSTRACT
AbstractThis paper presents a multi-agent reinforcement learning algorithm to represent strategic bidding behavior by carriers and shippers in freight transport markets. We investigate whether feasible market equilibriums arise without central control or communication between agents. Observed behavior in such environments serves as a stepping stone towards self-organizing logistics systems like the Physical Internet, while also offering valuable insights for the design of contemporary transport brokerage platforms. We model an agent-based environment in which shipper and carrier actively learn bidding strategies using policy gradient methods, posing bid- and ask prices at the individual container level. Both agents aim to learn the best response given the expected behavior of the opposing agent. Inspired by financial markets, a neutral broker allocates jobs based on bid-ask spreads. Our game-theoretical analysis and numerical experiments focus on behavioral insights. To evaluate system performance, we measure adherence to Nash equilibria, fairness of reward division and utilization of transport capacity. We observe good performance both in predictable, deterministic settings ($$\sim $$ ∼  95% adherence to Nash equilibria) and highly stochastic environments ($$\sim $$ ∼  85% adherence). Risk-seeking behavior may increase an agent’s reward share, yet overly aggressive strategies destabilize the system. The results suggest a potential for full automation and decentralization of freight transport markets. These insights ease the design of real-world market platforms, suggesting an innate tendency of markets to reach equilibria without behavioral models, information sharing or explicit incentives.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....