Imagine you have a deck of 52 uniquely labeled playing cards each one being different such that there are no cards which repeat the same label as another card. Shuffle the cards "randomly" placing them in a pile facing down so that you can not see the labels and that they are in one pile of 52 items that would be drawn in a specific order and in only one order if whatever card is on top is drawn each time.
Normally each of the unique card labels will be given a probability of 1/52 chance or (100/52) percent chance of being drawn on the first draw and all of the card labels will be assigned the exact same probability individually.
In reality there is only one card on the top of the deck of 52 cards and it has a 100% chance of being drawn when the top card is drawn, all other cards have a 0% chance of being drawn.
Forms of statistics that use probability are not scientific and not mathematical. Forms of statistics that use probability include mathematical functions but are not truly a form of math or at least not a truthful and factually correct form of math or science in terms of accurately representing reality.
It is possible to use math without being scientifically minded in attitude or mathematically truthful. For example someone could make a formula that the number of inches tall any and every man in Sweden is, is equal to 50 plus the number of fingers they have they could even measure many men in Sweden and find many (but not all) of them to be 60 inches tall rounded to the nearest inch and simultaneously having 10 fingers. That would be a mathematic formula to represent height but it would not be valid math or science to do so even if the conclusion was correct for a large number of Swedish men the reasoning to arrive at such a formula would not be valid. You could call it math if you want to but it would not be truthful math. You could call anything in which quantities are calculated using mathematical formulas math if you want to but you really should not call probability math in the sense of being valid, reasonable, correct or truthful math.
Statistics with probability is not valid math even though it uses mathematical functions to do calculations instead the use of statistics with probability is a religious worldview that is used by many people that label themselves as scientists. It consists of a protocol for calculating probabilities, but as I already demonstrated with the cards example that protocol gives you the incorrect answer.
I do not have any problem with statistics without probability as you could count how many of something there is and call that statistics and that is fine. But the use of statistics with probability such as paired and unpaired t-tests, ANOVA tests etc. are not in line with a proper use of the scientific method even if quantities are measured and then calculations are done with those measured quantities.
Statistics with probability is not pure math it is applied math and the application of that math does not apply correctly to represent reality.
The scientific method is testable, falsifiable, observable, repeatable, accurately predictive and measurable. The use of probability in "science" is not repeatably accurately predictive or falsifiable.
For example Newtons laws of motion will give you a single answer for kinematic problems involving the predicted location of an object at a specific time based on mathematical formula. Those predictions do not give a distribution of probabilities of different locations but a single location for a single object at a single point in time that can be measured. Either the object is at that location or it is not within a certain margin of reasonable error and rounding. Newtonian laws (excluding G but including the use of g) have been shown to accurately predict the location of objects in the world we live on repeatably. A prediction is made based on input and a output occurs the output can then be decided as either refuting the prediction or confirming the prediction within a certain acceptable margin of error that is why testing Newton's laws can be done using the scientific method because it is not a probabilistic model but is instead a falsifiable model.
When metric and a decimal number system is used with metric measurement tools the margin of what is reasonable error is not based on probabilities of distribution of measurements but significant digits. With significant digits the range of possible measurements is explained based on the limitations of the measurement tools. There is an upper maximum measurement and lower minimum measurement based on rounding up and rounding down to the nearest second to last digit. An exact and precise set or range of output answers can be calculated using the values for each input rounded down or rounded up to the second to last digit with no need for probability whatsoever to determine what range is acceptable for the output. For example if something is measured as a length of 3.15 meters it's actual length could be anywhere between a minimum of 3.1 meters and a maximum of 3.2 meters a range of acceptable outputs can be determined based on plugging in a range of input values between 3.1 meters and 3.2 meters into whatever function is used to predict the output. There is no need in such a case to give probabilities assigned to each output value, if it is outside the range the function failed to predict and if it is within the range the function succeeded in prediction.
The scientific method is falsifiable meaning that a formula can be shown to falsely reflect reality if the output is not in the range calculated for the input. Probabilistic statistical methods such as t tests work entirely differently and are fundamentally non falsifiable despite a false pretense of falsifiability. First an assumption of a normal distribution curve is typically assumed with no proof of such a distribution in reality although sometimes methods are used to try to test if it actually is a normal distribution curve within a certain probability. Then based on the faulty assumption that both samples or populations are normally distributed a standard deviation and mean is separately calculated for each of the two separate groups. Then a alpha value is arbitrarily chosen and an arbitrary decision is made as if a one tail or two tail test will be used with some fake justification as to why one or two tails is to be chosen. It is then decided if the average mean of these two groups is different because they both have the same distribution and the difference was caused by random variance or have two different distributions caused by a non random factor. If the two different groups have no difference in there average mean (a difference of zero) such a t test will not be needed because no matter what non zero alpha value is assigned the conclusion will be the same. If the two groups have different means whether or not the difference between the two groups was statistically significant is based on the alpha value and number of tails if a different alpha value was chosen a different conclusion would be reached. This testing method is claimed to be scientific and falsifible but it really is unfalsifiable because having chosen a different alpha value would change the conclusion, choosing an alpha value in advance does not magically meet the requirement of falsifiability because the alpha value is chosen arbitrarily and not by the physical limitations of the tools or any physically observable means but with how much "risk" the performer (or the performers supervisor/boss) is comfortable with. You can not actually know if the measurements for the groups are different by random or non random causes you only allegedly get a probability that they are different making this testing method non falsifiable.
Resorting to using t tests instead of predictive formula that can be potentially falsified shows a lack of knowledge as to the subject matter of the experiment or requirement to act as though lacking knowledge in the case of just doing it for homework or for an employer or supervisor. For example if a specific dose of medicine such as 10 mg is being tested to effect heart rate a t test could be used to determine if the control group not using the medicine has a statistically significant different heart rate than the test group using the method but this involves no chemical/physics model for how the medicine chemically influences the heart. There is only a comparison between 10 mg and 0 mg but no function of heart rate based on the input of dosage and other variables, no prediction is made in advance as to what the heart rate will be as a function of dose such that there will be a specific heart rate for 5 mg and a different one for 10 mg or 7 mg each predicted output being a specific number for each input. There is no predicted output at all for this kind of testing only a conclusion of a probability that the results of the two groups are different. The output predicted for each input can not be falsified when there is no numerical prediction as to what the output actually will be. Claiming the heart rate will be 60 bpm + 1 bpm*dosage/1mg is a testable formula but claiming the heart rate will be different with no quantity predicted as to the number of bpm is not a formula at all and therefore certainly not a testable and falsifiable formula. Making such a claim involves no need for the presentation of mathematical formula involving chemical reactions of the drugs and this displays a fundamental lack of knowledge in the subject matter tested in such an example.
Probability based Statistics should only maybe be used when one admits they do not have a understanding of how the subject matter tested physically or chemically works. In the long run one should not try to get a better understanding by doing more statistical tests for several years but instead start proposing formulas based on past observations then test those formula to see if they correctly predict future quantities and reject them if they do not then try testing new mathematical formula instead until a formula is found that has a trend of making successful predictions. Probability based statistics can be done rarely and occassionally in the short run just to get a guess as if doing something has any effect at all but should be replaced with the use of formula instead in the long run. Probability based statistics should not be the primary basis for medicine.
Look at the percentage of scientific journal articles in industrial engineering, quality control engineering, biology, psychology, sociology and health related fields that use statistical tests instead of formula with predicted output as a function of input being tested and you will understand that the early 21st century and the 20th century were metaphorically in the dark ages when it came to science and people were publishing religion disguised as science in many of the articles in scientific journals during this time period.
Look at the curticulum of graduate programs almost always requiring statistics classes for graduate students to take during graduate school but rarely requiring pure math courses to take during graduate school and the proportion of thesis projects in industrial engineering, psychology, sociology and health science related degrees using statistics instead of mathematical formula and understand that in that time period proper scientific research methodology was not being taught in many science, health and industrial engineering classes.
Copyright Carl Janssen 2021 October 22
