Deep Reinforcement Learning for Packaging Optimization

Thumbnail Image
Dowdell, Emily
Deep Reinforcement Learning , Reinforcement Learning , Packaging Optimization , Packing Optimization , Layout Optimization
Packing, packaging, and layout optimization is required for efficient systems design within a wide range of industries. The increasing demand for lighter, smaller, and more efficient designs to be rapidly produced has stressed the need for computational tools to solve these optimization problems. Current methods for packing and layout optimization remain limited to simple examples and are not scalable to the complex packaging optimization problems encountered in industry. Recent advancements in computational power and memory have permitted the adoption of reinforcement learning (RL) for various applications. The development of multi-agent reinforcement learning for multi-agent systems has broadened the variety of tasks that can be modelled with RL. The rapid development of RL methods has prompted the investigation of deep reinforcement learning (DRL) for optimization. Due to the inherent properties of DRL, it may present several advantages over current packing optimization methods. Further investigation is required because the application of DRL for practical optimization tasks remains limited. In this work, an investigation of deep reinforcement learning for packaging optimization is presented. The methodology for formulating packing optimization problems as an RL task is proposed. The selection of an appropriate single and multi-agent algorithm for implementation is discussed. Several algorithm formulations are trained and tested to assess the performance of DRL for packing optimization. The multi-agent framework is more effective due to the ability for multiple agents to coordinate and be conditioned on localized behaviour. Many high quality solutions to the four square packing problem were obtained. Stochasticity was successfully incorporated with exploration to produce higher quality results. Although the algorithm showed the ability to inherently learn heuristics, alternate reward structures may be required to teach the agents to refine the solution. The poor performance of the proposed method for the four shape packing problem revealed the need for an improved overlap constraint penalty. For complex shapes, a more sophisticated exploration scheme is required to effectively search the multimodal packing space. This work lays the groundwork for further study of DRL for packing optimization and demonstrates the merit of future study of DRL for complex packaging optimization.
External DOI