Path Lines of Code README.md 71 contents/automated-benchmarks/basics.md 29 contents/automated-benchmarks/designing-your-automatic-evaluation.md 97 contents/automated-benchmarks/some-evaluation-datasets.md 147 contents/automated-benchmarks/tips-and-tricks.md 40 contents/general-knowledge/model-inference-and-evaluation.md 51 contents/general-knowledge/tokenization.md 49 contents/human-evaluation/basics.md 46 contents/human-evaluation/tips-and-tricks.md 17 contents/human-evaluation/using-human-annotators.md 24 contents/model-as-a-judge/basics.md 25 contents/model-as-a-judge/designing-your-evaluation-prompt.md 30 contents/model-as-a-judge/evaluating-your-evaluator.md 15 contents/model-as-a-judge/getting-a-judge-llm.md 26 contents/model-as-a-judge/tips-and-tricks.md 24 contents/model-as-a-judge/what-about-reward-models.md 34 contents/troubleshooting/troubleshooting-inference.md 38 contents/troubleshooting/troubleshooting-math-parsing.md 235 contents/troubleshooting/troubleshooting-reproducibility.md 70 resources/About NLP.md 11 resources/About evaluation.md 32 translations/CONTRIBUTING.md 10 translations/zh/contents/automated-benchmarks/basics.md 28 translations/zh/contents/automated-benchmarks/designing-your-automatic-evaluation.md 97 translations/zh/contents/automated-benchmarks/tips-and-tricks.md 40 translations/zh/contents/general-knowledge/model-inference-and-evaluation.md 51 translations/zh/contents/general-knowledge/tokenization.md 49 translations/zh/contents/human-evaluation/basics.md 46 translations/zh/contents/human-evaluation/tips-and-tricks.md 17 translations/zh/contents/human-evaluation/using-human-annotators.md 24 translations/zh/contents/model-as-a-judge/basics.md 25 translations/zh/contents/model-as-a-judge/designing-your-evaluation-prompt.md 30 translations/zh/contents/model-as-a-judge/evaluating-your-evaluator.md 15 translations/zh/contents/model-as-a-judge/getting-a-judge-llm.md 26 translations/zh/contents/model-as-a-judge/tips-and-tricks.md 24 translations/zh/contents/model-as-a-judge/what-about-reward-models.md 34 translations/zh/contents/troubleshooting/troubleshooting-inference.md 38 translations/zh/contents/troubleshooting/troubleshooting-math-parsing.md 233 translations/zh/contents/troubleshooting/troubleshooting-reproducibility.md 70