top of page

Llama2-70B Shows Strong Performance on Cisco C845A

As artificial intelligence continues to grow, it is important to measure how efficiently hardware and software can handle real-world AI workloads. The MLPerf Inference 5.1 benchmark provides a standardized way to do this. In recent tests, the Cisco C845A rack server, equipped with eight NVIDIA H200 NVL PCIe GPUs, was used to evaluate the performance of Meta’s Llama2-70B, a large language model with 70 billion parameters.


Llama2 is designed for tasks like text generation, translation, summarization, and question answering. It is open-source and free to use commercially, making it popular in AI research and development.
Llama2 is designed for tasks like text generation, translation, summarization, and question answering. It is open-source and free to use commercially, making it popular in AI research and development.

In the MLPerf tests, Llama2-70B achieved 99 percent accuracy with 29,593 tokens per second in server scenarios and 31,429 tokens per second offline. At 99.9 percent accuracy, performance was nearly the same, showing the model can maintain very high accuracy without slowing down.


ree

Interactive tests focused on real-time responses. Llama2-70B-Interactive achieved 16,143 tokens per second at 99 percent accuracy and 16,149 tokens per second at 99.9 percent accuracy. This demonstrates that even in interactive workloads, the Cisco C845A server delivers consistent performance.


These results show that Cisco C845A servers with NVIDIA H200 GPUs are a reliable choice for running large language models. They can handle both heavy batch tasks and real-time interactive tasks with high speed and accuracy. Benchmarks like MLPerf help organizations choose the right hardware for next-generation AI applications.



bottom of page