Evaluating Large Language Models: The Overfitting Problem — PLINKFEED