-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于Llama-3.1-8B-Instruct在Longbench v2 测试结果和排行榜有出入的问题 #94
Comments
看起来你的测试结果和我们测试结果出入不大,我认为这基本在随机误差之内。请问你的截断方式是什么样的呢? |
我的部署测试方式 和示例一致。 部署方式为: 测试方式为: python pred.py --model ${model_path} |
我测的只有28.2,固定了随机种子是42,感觉波动还蛮大的... |
这里的随机性不只是由随机种子导致的。 |
您好,
我测试的Llama-3.1-8B-Instruct 结果如下:
Model Overall Easy Hard Short Medium Long
Llama-3.1-8B-Instruct 29.0 30.7 28.0 33.9 25.6 27.8
和排行榜中的Overall 有一个点的差距(29.0 vs 30.0),我的环境如下:
vllm==0.5.3.post1
transformers==4.45.0
请问测试Llama-3.1-8B-Instruct 还需要什么特殊处理吗
The text was updated successfully, but these errors were encountered: