A Comparative Study of LLMs for Infrastructure-as-Code Generation and Optimization
Main Article Content
Abstract
The growing popularity of Infrastructure-as-Code (IaC) in contemporary DevOps pipelines has catalyzed the continuous surge in the demand to use automation tools that are not only effective, yet secure and stable as well. Large Language Models (LLMs) have shown to be strong generative AI alternatives to generate code with IaC platforms like Terraform and AWS CloudFormation. This paper shows a product-language objective assessment contrasting the state-of-the-art proprietary and open-source LLMs, such as GPT-4 and PaLM 2 to Claude, Code LLaMA, StarCoder and CodeGen in the generation and optimization of IaC scripts. By relying on a multi-criteria evaluation process, the results of the models were compared to check performance on the basis of code correctness, optimization quality, platform compatibility, and security compliance in an environment involving 30 real-life infrastructure scenarios. This conclusion shows that proprietary models such as GPT-4 will always be tuned into surpassing open-source followers by returning syntactically correct, optimized, and secure infrastructure patterns that can be used in production-level deployments. Nevertheless, open-source models as a source of news proved to be limited in value when it comes to experimentation and quick prototyping, showing higher rates of variability and repetitive issues, particularly disposing entirely of security compliance and smart optimization. These findings show that thoughtful model choice procedures and sound validation efforts are needed in the use of LLMs in designing critical infrastructure. The research offers practitioners actionable information on the capacity and shortcomings of the existing LLMs on the IaC automations and gives a scope of future research based on using generative AI to realize secure and scalable infrastructure management.