1 min readApr 24, 2020
I have another question. The authors proposed to use the h representations i.e. the representations till the intermediate step (without passing them through the projection head) while doing the linear evaluation. If my understanding is correct, you used the projected representations in the evaluation. Could you elaborate about it?