The maximum tardiness reflects the worst level of service associated with customer needs; thus, the principle that seru production reduces the maximum tardiness is investigated, and a model to minimize the maximum tardiness of the seru production system is established. In order to obtain the exact solution, the non-linear seru production model with minimizing the maximum tardiness is split into a seru formation model and a linear seru scheduling model. We propose an efficient cooperative algorithm using a genetic algorithm and an innovative reinforcement learning algorithm (CAGARL) for large-scale problems. Specifically, the GA is designed for the seru formation problem. Moreover, the QL-seru algorithm (QLSA) is designed for the seru scheduling problem by combining the features of meta-heuristics and reinforcement learning. In the QLSA, we design an innovative QL-seru table and two state trimming rules to save computational time. After extensive experiments, compared with the previous algorithm, CAGARL improved by an average of 56.6%. Finally, several managerial insights on reducing maximum tardiness are proposed.