arXiv preprint arXiv:2405.19888, 2024 Flashdecoding++: Faster large language model inference on gpus.Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang.