Llama 1B Inference Achieved in Single CUDA Kernel - Rollup News