The state of AI infrastructure at scale: Exposing GPU utilization challenges

Generative AI apps like ChatGPT are raising concerns about the impact of artificial intelligence on a range of issues including disinformation as well as copyright over images, sound and text – © AFP Julio Cesar AGUILAR

The AI Infrastructure Alliance, MLOps co ClearML and chip firm FuriosaAI have teamed up as so to assess what business executives think about artificial intelligence. The output is a new report titled “The State of AI Infrastructure at Scale 2024: Unveiling Future Landscapes, Key Insights, and Business Benchmarks”. The tome includes responses from AI/ML and technology leaders across North America, Europe, and Asia Pacific, addressing issues and obstacles to scale-up.

Many executives reported that having and using Open Source technology is important for their organization. With most focused on customizing Open Source models. PyTorch is their framework of choice. PyTorch is a machine learning library used for applications such as computer vision and natural language processing.

This assessment has revealed that the biggest challenge is in scaling AI is compute limitations (an issue of both availability and cost). The next top challenge was infrastructure issues.

Central concerns are with:

How executives are building their AI infrastructure.
The critical benchmarks and key challenges they face.
How they rank priorities when evaluating AI infrastructure solutions against their business use cases.

More specifically in relation to the compute concerns, latency was top-ranked at, followed by power consumption. To address this, the majority of executives plan to use more cloud compute and many will buy more GPU machines on-premises in 2024 (a graphics processing unit – GPU – is an electronic circuit that can perform mathematical calculations at high speed. Computing tasks like graphics rendering, machine learning, and video editing require the application of similar mathematical operations on a large dataset).

On the issue of latency, over half of respondents plan to use language models (like LLama), followed by embedding models (BERT and family) in their commercial deployments. Mitigating compute challenges will be essential in their plans.

One challenge is the global limitations in GPU supplies. A global chip shortage, triggered by the COVID-19 pandemic in 2020, severely hampered the production of GPUs. The pandemic disrupted the global supply chain, causing delays in chip production and delivery. To counter GPU scarcity, most businesses are looking for or are interested in cost-effective alternatives to GPUs.

The main challenges for operating GPUs is with job scheduling and management. This is especially with coordinating tasks and workflows within the AI/ML technology stack; something that is necessary in order to optimize GPU and compute resource allocation.

For those who already operate cloud compute systems, the main concerns are around wastage and idle costs. In addition there are misgivings about the cost of overall compute power consumption.

The state of AI infrastructure at scale: Exposing GPU utilization challenges
#state #infrastructure #scale #Exposing #GPU #utilization #challenges