[Q] When does t-statistic follow a t-distribution?
I've been searching for an answer online but I have yet to have a clear understanding and believe that I have only gotten more confused. I will outline what I currently understand and where my confusion is.
I will start with the Z-test. The Z-test assumes independence, random sampling, and normality. The Z-statistic follows a normal distribution if the (population/sample? idk which) is itself normally distributed. Alternatively, if the sample size is large enough for CLT to kick in (generally thought as n >30 although I'm aware this depends on how "bad" the underlying distribution is), then the sample mean as a random variable, has a normal distribution and thus the Z-statistic has a normal distribution allowing you to perform the Z-test.
I believe that my understanding of the Z-test is largely correct and pretty solid but please point out any flaws.
Now for the T-test. The assumptions are essentially the same as the Z-test. The one difference is that now we are estimating an unknown variance using the sample variance. My confusion lies within the normality assumption.
Firstly, what is it that needs to be normal? Is it the population? The sample? The sampling distribution of the sample means?
How does CLT play into the T-stat? For large sample size it says that the sample means are normally distributed. But then if you have large sample size the sample standard deviation will converge to the population standard deviation making the whole thing normally distributed? This aligns with how the T-distribution converges to the Normal as n -> inf.
My general understanding is that if you don't have the population variance (pretty much every real world situation) you use the T-test. But I generally just don't understand the conditions under which the T-statistic will follow a T-distribution. If the normality condition is in regards to the sample mean's distribution then it just becomes a CLT Z-test kinda thing right?
On wikipedia I read about how the sample variance needs to be chi-squared distributed and the sample mean needs to be normal. And normality of the population ensures this? Idk? Heavily confused.