top of page

Llama2-70B Shows Strong Performance on Cisco C845A

As artificial intelligence continues to grow, it is important to measure how efficiently hardware and software can handle real-world AI workloads. The MLPerf Inference 5.1 benchmark provides a standardized way to do this. In recent tests, the Cisco C845A rack server, equipped with eight NVIDIA H200 NVL PCIe GPUs, was used to evaluate the performance of Meta’s Llama2-70B, a large language model with 70 billion parameters.


Llama2 is designed for tasks like text generation, translation, summarization, and question answering. It is open-source and free to use commercially, making it popular in AI research and development.
Llama2 is designed for tasks like text generation, translation, summarization, and question answering. It is open-source and free to use commercially, making it popular in AI research and development.

In the MLPerf tests, Llama2-70B achieved 99 percent accuracy with 29,593 tokens per second in server scenarios and 31,429 tokens per second offline. At 99.9 percent accuracy, performance was nearly the same, showing the model can maintain very high accuracy without slowing down.



Interactive tests focused on real-time responses. Llama2-70B-Interactive achieved 16,143 tokens per second at 99 percent accuracy and 16,149 tokens per second at 99.9 percent accuracy. This demonstrates that even in interactive workloads, the Cisco C845A server delivers consistent performance.


These results show that Cisco C845A servers with NVIDIA H200 GPUs are a reliable choice for running large language models. They can handle both heavy batch tasks and real-time interactive tasks with high speed and accuracy. Benchmarks like MLPerf help organizations choose the right hardware for next-generation AI applications.



GSN

全球超级计算平台

© 2025 Amaryllo Inc.,保留所有权利。各种商标均归其各自所有者所有。

服务

为什么选择 GSN?

云计算平台

GPU 交易平台

人工智能软件服务

市场租赁定价

法律信息

可接受使用指南

服务条款

隐私政策

资源

GSN 快速入门

成为供应商

公司

关于我们

关于我们

bottom of page