Teaching The Narrow Way: April 2022

Importance of raw data in knowing practical or clinical significance as opposed to statistical significance

Sometimes a t test is very useless to achieving real life goals yet it is incredibly popular to publish t tests comparing two sets of data without showing the raw data in medical science. Although it could make sense to remove some parts of raw data in order to keep patient data confidential someone could simply replace identifying data such as patient name with unidentifiable information such as id numbers from which the patient name can not be determined. This would not be possible in all cases such as studying if there is a correlation between the latitude or longitude of primary location of residence address and health conditions but would probably have been possible in most cases that occurred in the past.

In many cases it maybe more useful to know what each of two distributions look like rather than if two distributions are statistically significantly different by a certain alpha value or if one distribution is statistically significantly greater than the other by a certain alpha value

In many cases the question of practical significance or clinical significance is more important than statistical significance. A question of practical significance could be, what is the probability that a single specific choice will result in a value that is acceptable for my goal. Other times a question of practical significance could be out of all the available choices which choice has the highest probability of resulting in a value acceptable for my goal.

In such a case it might make more sense to simply look at the raw data of a study and count the number of cases in which the desired value was greater than the minimum value acceptable and less than the maximum value acceptable and then divide that number by the total number of cases.

For example if someone is comparing two treatments for blood pressure someone and wants to decrease a patients blood pressure by at least 10 units but by no more than 50 units it would make more sense to figure out the frequency at which each treatment in a study resulted in a decrease of more than 10 units but less than 50 units than to use a t test. This would be done by counting the number of cases that result in a decrease of at least 10 units but no more than 50 units then dividing by the total number of cases for each distribution. Next the person choosing the two treatments figures out which treatment has a higher percentage of acceptable results and chooses that treatment.

This can not be done without the raw data which is commonly removed from scientific journal articles that give the results of a t test comparing two sets of raw data but removes the raw data. It might be possible however if the raw data was removed but both sets of raw data were given a standard deviation and mean, to use that information, to estimate the percentage or frequency of cases for each of two choices that produced a decrease of at least 10 units but no more than 50 units in the example above, by using the mean and standard deviations published to construct a distribution to represent each choice based on the assumption of a normal distribution for each of the two treatments that were used. This however would be a very bad idea if no results for tests of having a normal distribution were presented in the scientific journal article. I would not be surprised if it is very common never to show the results of any tests showing that the two distributions were actually normally distributed before conducting a t test. I would like to see a study looking at the percentage of studies in which normal distribution was not checked for but a t test was performed.

Sometimes a question of practical significance might not be so delicate as to have both a upper and lower limit in desired outcomes. For example if someone wanted a fire arm that could shoot at least a distance of 200 units but did not worry if it could shoot too great a distance. It would make more sense to look at the raw data for each of the different firearm choices and calculate the percent frequency that exceeded a firing distance of 200 units for each firearm choice's distribution than to conduct a t test on the raw data to determine if each of two firing distribution choices available produced statistically significant different results for an alpha value or if one test produced statistically significant greater results for an alpha value

Sometimes a question of practical significance is yet another degree even less delicate than that and someone might not even care if a at least a minimal number of units were produced but only want to know which of two choices produced greater results the most often. Someone might say, now we can finally use a t test legitimately. And in such a case you could use a t test but it still would not be the best choice for at least two reasons.

Firstly, a t test would not be the best choice because the question was not which choice had a greater mean but which choice produced greater results than the other available choice most often. Secondly, a t test would not be the best choice because you can not know if the results of both distributions will be normally distributed before conducting the test.

I am going to write a possibly new statistics test I thought of which may or may not have already been created by someone else under another name. I will name this test the "Percentage of greater values between two distributions" test.

Test steps

This test works for ordinal, interval or ratio data and it does not matter if the two sets of data being compared have equal sample sizes or not or if they have odd or even sample sizes.

After collecting data from two samples order the values of the data for each sample in a list in ascendimg or descending order.

Pick one of the two lists

For each value on the chosen list count the number of values on the other list that are below that value as 1 point each, the number of data points exactly equal to that value as 0.5 points each and the number of data points higher than that value as 0 points each. Each value on the chosen list will have a certain score or sum total of points. If there are more than one data points on the list other than the chosen list that have that value count that data point multiple times. Add up the total number of points for each value on the chosen list together for all the values on the chosen list and divide it by the product of the sample size of the chosen list and the sample size of the other list. If a value on the chosen list occurs multiple times then count or use that value multiple times when adding up the points for all of the values. You can swap lists next and get a score for each list. The final value for each list should give you the percent of time that a value randomly selected from the population that one list was sampled from should be greater than another value randomly selected from the population the other list was sampled from. In some cases involving data with discrete quantized values it maybe better to set the 0.5 points each values to 0 points instead as long as you do so consistently for both lists, I choose them to be 0.5 points based on the assumption of non quantized data in which it could have a 50% chance of being above and 50% chance of being below the other data point if a very small measurement difference was made such as one data point being 1.49999 and the other being 1.50001 instead of both being 1.5 when measured to two significant digits each.

Examples

I will use this possibly new test to compare two such sets of random uniform data each with a sample size of five, one between 0 and 1 and the other between 0.1 and 1.1 both to 4 decimal places from

https://www.random.org/decimal-fractions/

https://web.archive.org/web/20220402133005/https://www.random.org/decimal-fractions/

I will add 0.1 to the second set of numbers between 0 and 1 to get a range between 0.1 and 1.1

( I was hoping this would increase the likelihood of the results for the two distributions looking different but unfortunately the distribution which has 0.1 added to it had a lower mean and median and the opposite of intended results were produced)

In order to compare this test with a t-test I will assume the p value for the possibly new data is one minus the percent that a randomly selected value from one list will be greater than a randomly selected value from another list, even though this does not line up the same way as a t test because a t test only compares if one mean is likely to be greater than another. I will also look at the percent of values from one list that are less than or less than or equal to the mean and median of the other list ( four combinations choice of mean or median and choice of less than or less than or equal to. )

Here are your random numbers:

Timestamp: 2022-04-11 00:42:01 UTC

List A (sample A in order from least to greatest)

0.2781

0.3538

0.5241

0.8349

0.8425

Sample B before adding 0.1

Here are your random numbers:

Timestamp: 2022-04-11 00:46:43 UTC

0.1081

0.3479

0.3923

0.4587

0.7225

List B ( sample B in order after 0.1 is added)

0.2081

0.4479

0.4923

0.5587

0.8225

Start of example for "Percentage of greater values between two distributions" test

Value B, Value A, points ( 1 if A<B, 0.5 if A=B, 0 if A>B)

0.2081, 0.2781, 0

0.2081, 0.3538, 0

0.2081, 0.5241, 0

0.2081, 0.8349, 0

0.2081, 0.8425, 0

0.4479, 0.2781, 1

0.4479, 0.3538, 1

0.4479, 0.5241, 0

0.4479, 0.8349, 0

0.4479, 0.8425, 0

0.4923, 0.2781, 1

0.4923, 0.3538, 1

0.4923, 0.5241, 0

0.4923, 0.8349, 0

0.4923, 0.8425, 0

0.5587, 0.2781, 1

0.5587, 0.3538, 1

0.5587, 0.5241, 1

0.5587, 0.8349, 0

0.5587, 0.8425, 0

0.8225, 0.2781, 1

0.8225, 0.3538, 1

0.8225, 0.5241, 1

0.8225, 0.8349, 0

0.8225, 0.8425, 0

Total points 10

Sample Size A * Sample Size B = 5 * 5 = 25

Total Points / Sample Size = 10 / 25

estimated 40% of random value from population B being greater than value from population A

one tailed P value = 1 - 0.40 = 0.60 used for comparing with one tailed t test if population B > A

100% - 40% = 60%

estimated 60% of random value from population A being greater than value from population B

one tailed P value = 1 - 0.60 = 0.40 used for comparing with one tailed t test if population A > B

Value A, Value B, points ( 1 if B<A, 0.5 if B=A, 0 if B>A)

0.2781, 0.2081, 1

0.2781, 0.4479, 0

0.2781, 0.4923, 0

0.2781, 0.5587, 0

0.2781, 0.8225, 0

0.3538, 0.2081, 1

0.3538, 0.4479, 0

0.3538, 0.4923, 0

0.3538, 0.5587, 0

0.3538, 0.8225, 0

0.5241, 0.2081, 1

0.5241, 0.4479, 1

0.5241, 0.4923, 1

0.5241, 0.5587, 0

0.5241, 0.8225, 0

0.8349, 0.2081, 1

0.8349, 0.4479, 1

0.8349, 0.4923, 1

0.8349, 0.5587, 1

0.8349, 0.8225, 1

0.8425, 0.2081, 1

0.8425, 0.4479, 1

0.8425, 0.4923, 1

0.8425, 0.5587, 1

0.8425, 0.8225, 1

Total points 15

Sample Size A * Sample Size B = 5 * 5 = 25

Total Points / Sample Size = 15 / 25

estimated 60% of random value from population A being greater than value from population B

one tailed P value = 1 - 0.60 = 0.40 used for comparing with one tailed t test if population A > B

100% - 60% = 40%

estimated 40% of random value from population B being greater than value from population A

one tailed P value = 1 - 0.40 = 0.60 used for comparing with one tailed t test if population B > A

End of test example for "Percentage of greater values between two distributions" test

Comparing with t - test

One tailed t-test to determine if population A > B

https://www.socscistatistics.com/tests/studentttest/default2.aspx

p-value is .351604

Comparing with percent of values above and below mean and median of data from other list

Percent of values in one list that are less than or greater than mean or median of values from other list

List B ( sample B in order after 0.1 is added)

0.2081

0.4479

0.4923

0.5587

0.8225

median = 0.4923

mean = 0.5059

list of values from sample A that are less than mean of sample B ( 0.2781, 0.3538 )

list of values from sample A that are greater than mean of sample B ( 0.5241, 0.8349, 0.8425 )

no values from sample A that are equal to the mean of sample B

list of values from sample A that are less than median of sample B ( 0.2781, 0.3538 )

list of values from sample A that are greater than median of sample B ( 0.5241, 0.8349, 0.8425 )

no values from sample A that are equal to the median of sample B

60% of values from A are greater than or equal to the mean of B

60% of values from A are greater than the mean of B

60% of values from A are greater than or equal to the median of B

60% of values from A are greater than the median of B

https://www.calculatorsoup.com/calculators/statistics/average.php

https://web.archive.org/web/20220324192741/https://www.calculatorsoup.com/calculators/statistics/average.php

List A (sample A in order from least to greatest)

0.2781

0.3538

0.5241

0.8349

0.8425

median = 0.5241

mean = 0.56668

list of values from sample B that are less than mean of sample A ( 0.2081, 0.4479, 0.4923, 0.5587 )

list of values from sample B that are greater than mean of sample A ( 0.8225 )

no values from sample B that are equal to the mean of sample A

list of values from sample B that are less than median of sample A ( 0.2081, 0.4479, 0.4923 )

list of values from sample B that are greater than median of sample A ( 0.5587, 0.8225 )

no values from sample B that are equal to the median of sample A

20% of values from B are greater than or equal to the mean of A

20% of values from B are greater than the mean of A

40% of values from B are greater than or equal to the median of A

40% of values from B are greater than the median of A

Start of example for "Percentage of greater values between two distributions" test

Sample E

0.1081

0.3479

0.4587

0.7225

0.3923

List E

0.1081

0.3479

0.3923

0.4587

0.7225

Calculating scores for list E

Value B, Value A, points ( 1 if A<B, 0.5 if A=B, 0 if A>B)

0.1081, 0.2781, 0

0.1081, 0.3538, 0

0.1081, 0.5241, 0

0.1081, 0.8349, 0

0.1081, 0.8425, 0

0.3479, 0.2781, 1

0.3479, 0.3538, 0

0.3479, 0.5241, 0

0.3479, 0.8349, 0

0.3479, 0.8425, 0

0.3923, 0.2781, 1

0.3923, 0.3538, 1

0.3923, 0.5241, 0

0.3923, 0.8349, 0

0.3923, 0.8425, 0

0.4587, 0.2781, 1

0.4587, 0.3538, 1

0.4587, 0.5241, 0

0.4587, 0.8349, 0

0.4587, 0.8425, 0

0.7225, 0.2781, 1

0.7225, 0.3538, 1

0.7225, 0.5241, 1

0.7225, 0.8349, 0

0.7225, 0.8425, 0

Total points 8

Sample Size E * Sample Size B = 5 * 5 = 25

Total Points / Sample Size = 8 / 25

estimated 32% of random value from population E being greater than value from population A

one tailed P value = 1 - 0.32 = 0.68 used for comparing with one tailed t test if population E > A

100% - 32% = 68%

estimated 68% of random value from population A being greater than value from population E

one tailed P value = 1 - 0.68 = 0.32 used for comparing with one tailed t test if population A > E

End of example for "Percentage of greater values between two distributions" test

Comparing with other tests

one tailed t-test for mean of population A > mean of population E

P value = 0.163344

https://www.socscistatistics.com/tests/studentttest/default2.aspx

Teaching The Narrow Way

Friday, April 15, 2022

Evolutionary biologists implying birds are reptiles

Sunday, April 10, 2022

Percentage of greater values between two distributions

Special Relativity Experiments short

Followers

Total Pageviews