This table shows comparison between what will the test results show (that is the study results here are referred to as "test results") with respect to the status of the null hypothesis. If the null hypothesis is true but the study results suggest that the null hypothesis should be rejected, then we have committed an error. This error is "false positive error", that is, we have obtained a positive result in favour of our alternative hypothesis and it may lead us to believe that our theory is correct, but in fact that is not the case. Hence we label it as Type I error. This is also referred to as \(\alpha\) error; we must specify the error rate before the study begins. Depending on how confident we are on our hypothesis, we can set it to any value. By convention, it is set to 5% error; by setting it so, we state that if we were to conduct our study over and over again, in 100 iterations, we would at most commit 5 times that our findings will not match the null hypothesis truth, that is if the null hypothesis truly were to be rejected, in 95 times out of 100 iterations, we would be able to reject it.
Compare this with another error. This time we commit a false negative error and we term this as \(\beta\) error; here, we will collect data and analyse and then realise that we cannot reject the null hypothesis but the null hypothesis should actually be rejected as it is false. We have committed an error and we can set the error rate at something like 20%, so we state that if we were to conduct 100 iterations of this study, we may be wrong this way about 20 out of those 100 times, and we would call that we fail to reject the null when the null is true 80 out of 100 times. That also means that if the null were to be truly false, we would correctly reject the null 80% of the time. This is not the same as that if we were to reject the null, we would wrongly reject the null only 5% of the time. So this figure that we'd correctly reject the null WHERE the null is false is our power of the study. The alpha error, the power of the study we aim for, and the effect size that we would like to study together are factors for us to estimate the required sample size for any study we'd like to conduct. But note that we'd have to call these figures before the study begins.
Then we conduct the study and we obtain some figures to compare with the null. This depends on what effect size we would like to compare for the study we conduct. If we want to settle for odds ratio or relative risk estimate on the association between two variables (exposure variable and outcome variable), then under conditions of the null, the effect size would be 1.0. On the other hand, if we wanted to study the difference between two measures, then under conditions of the null, the effect size would be 0. We also assume as we conduct our study that this study is one of the myriad possible iterations, so the effect size we obtain here should follow a normal distribution pattern and therefore we now estimate the probability that the effect size we have obtained falls within the expected distribution of the null estimate. Note that the null's point estimate is 1.0 (for ratio type measure) or 0.0 (for difference type measures) but taking that as a point estimate, the null hypothesis could predict that the band around the null value could take any other value, it's just that their point estimate will be 1.0 or 0.0. Does our point estimate and the band we construct around our point estimate fall within that boundary? Or is it far out? If far out, how far out?
For example, let's say we conduct a study on arsenic exposure through drinking water at various levels and the risk of skin diseases. Further, for illustrations, let's say we would like to compare the risk of skin lesions for people who were exposed to inorganic arsenic at 50 ug/L or less and those who were exposed to levels like 200 ug/L or more. Then we fix that those who were exposed to the lowest level of arsenic exposure (that is 50 ug/L or less) as the reference category and say that at that level, we will consider the risk to conform to null value so the odds ratio will be fixed at 1.0 (that is no risk). Then, based on our sample, we compare people who were exposed to arsenic at 200 ug/L and those who were exposed to 50 ug/L or less. The effect measure is Odds Ratio and let's say we get a point estimate for our sample to be 2.5; we interpret that compared with those who did not have skin lesions, those who had skin lesions were 2.5 times likely to be in the arsenic exposure group of 200 ug/L or more. But this is just part of the story as we only have this one sample of people on whom we have based this. If we were to conduct this study a 100 times over, what would the distribution of the effect measure look like? So we calculate that as well, and let's say we find that range to lie between 1.5 through 4.5 for 95% of the 100 iterations. Based on these figures we state that the best estimate of the population odds ratio for the association between arsenic exposure at highest level (that is 200 ug/L or over) versus arsenic exposure at lowest level for skin diseases is 2.5 with 95% confidence interval band of 1.5 - 4.5; If there are other values, they will lie at the 5% extremes. We then go ahead to state that there is a statistically significant risk of skin lesions due to high arsenic exposure. We can also estimate the probability that our point estimate would be compatible with the null hypothesis? Let's say we get a figure of around 2% or there is a 2% chance that such a figure of effect size could be possible under conditions of the null. We then know that we have successfully rejected the null hypothesis at 5% level and we express this probability estimate as p = 0.02. This is the "p-value". So you can see that the p-value does not tell us any more than the story that there is a probability value we can put to the findings if the null were to be true but we do not know anything about the distribution of the effect size or any other value to be meaningful. But one could deem that the study is statistically significant. We will leave at that, and I encourage you to read more about the issues around p-values. A particularly useful paper to review about p-values is Andrew Gelman's commentary \cite{gelman2013commentary}.
You can see that p-values and 95% confidence interval estimates solve the issue whether an association we observe could be spurious or could have arisen because of chance factor alone; you now know that if the association were to be statistically significant, you could argue that a statistically significant association would settle or rule out the play of chance. But that just settles one aspect when it comes to health related research: there are two other issues that we need to settle before we can say that the association we observe between an exposure or an intervention and an outcome (health outcome) is one of true association. These are biases that are present in a study that can make the study invalid, and confounding variables that were not adjusted for: again rendering the association open to suspicion. But before we delve into those, let's take a look at issues around spurious correlations.

Spurious correlations & ecological fallacy

Spurious correlations are those where you have two variables that seem to be associated with each other and the associations cannot be just due to chance, yet it makes no sense to have these sort of associations. I recommend you review a website (and a book that is linked to the website) by Tyler Vigen \cite{vigen2015spurious} here:
http://www.tylervigen.com/spurious-correlations
Take a look at the first chart from the book: