This might not be mathematically accurate in terms of probability theory I am experimenting with the results of doing different things with numbers. These assumptions I am using do contradict with reality as do all probability based statistical hypothesis tests but possibly in different ways than some others.
I am literally making up this statistical test and calling it the custom distribution statistical hypothesis test. I do not know if someone already made a test like this before me and called it a different name
Goal : The probability distribution model that best represents data collected from the population that corresponds to real life samples is not known before collecting the data therefore it is erroneous to assume it will be a normal distribution before analyzing the sample data. I am trying to create a model that creates a custom model for frequency distribution to represent the population that corresponds to data collected from a sample then create a possibly new type of statistical test. I will call this possibly new type of statistical test a, "custom distribution statistical hypothesis test."
A "custom distribution statistical hypothesis test" is similar to a replacement for two sample independent unpaired t tests only with the assumption of a "custom distribution" instead of a normal distribution.
This statistics test includes determining the probability that the median of one sample of data would be at least the distance it is measured to be in a specific direction from the median of another sample of data based on using the "custom frequency distribution" to represent the population of the sample from the other set of data. This is done twice once for each sample of data compared to the other one.
By similar I do not mean the mathematical definition of the word similar. The shape of the "custom frequency distribution" is empirically determined in a custom manner for each sample of data simply by using a cumulative frequency table plus certain additional assumptions and it is assumed that the population has a similar shape to the "custom frequency distribution" determined by a table except that it is modeled as continuous instead of discrete.
The shape of the normal distribution used in t tests however is not determined empirically by the actual sample data but presumed to represent something "close enough" to the true shape of the population's probability distribution from which a sample is collected when running a t test. Although the standard deviation or variance and mean of a normal distribution used for a t test is determined empirically the shape is not. For example the shape of a set of data could appear to be uniform (or any other shape than a normal distribution) based on someone rational looking at a empirical probability plot or empirical probability table for a set of data but would still be treated as a normal distribution shape for a t test.
This test should work with ordinal, interval or ratio data when there is a odd sample size for both samples. When there is a even sample size for one or both samples one can use a mean to get the median for interval or ratio data but the test results should be less trustworthy than with a odd sample size. When there is a even sample size for one or both samples and the data is ordinal it does not work to use a mean to estimate the median unless you are taking the mean of two identical values. However, it might be possible in the case of an even number sample size to create a fictitious rank in between the two values that are closest to where the median would be that you would normally take the mean of to get the median with interval or ratio data and use that fictitious rank in place of the median. For example if the median would be between "10th street" and "11th street" you could create a fictitious rank of "the street that is less than 11th street but greater than 10th street" or if the median was between "3rd degree" and "4th degree" you could create a fictitious rank of "the degree that is more than 3rd degree but less than 4th degree" but you should not use 14th street as a replacement for the median that is in between 12th street and 16th street because there might be a real 14th street that is not located at the mean of the coordinates or positions of 12th and 16th street and you should not use 7th degree to represent a median that is in between 5th degree and 9th degree because 7th degree might be a real rank that is not the same as the mean of 5th degree and 9th degree.
Comparing the medians of two different samples with the probability distributions of the other sample to estimate the probability that the median of one population is greater than the median of the other population.
Assumption 1 : The sample which is expected to have a greater median is named sample B and the one expected to have a lower median is named sample A.
Assumption 2 : If two values collected from a sample have no collected values taken from a sample found between them then the probability of obtaining or being above a third value that is in between them when collecting a sample with a sample size of one from that population shall be inclusively between the probability of obtaining or being above each of the two values that were collected from the sample.
Assumption 3 : If two values collected from a sample have no collected values taken from a sample found between them then the probability of obtaining or being below a third value that is in between them when collecting a sample with a sample size of one from that population shall be inclusively between the probability of obtaining or being below each of the two values that were collected from the sample.
How I am defining "inclusively between"
2 <= X <= 3 in this example X is inclusively between 2 and 3
2 < X < 3 in this example X is exclusively between 2 and 3
Assumption 4 : The median of a population shall be estimated to be the median of the sample taken from that population. Sample A is a sample taken from population A and sample B is a sample taken from population B.
Assumption 5 : If both the median of sample A is lower than the median for the population estimated from sample B and the median of sample B is higher than the median for the population estimated from sample A, both for the same alpha level, then the median of population B is higher than the median of population A for that alpha level. If exactly one of those two conditions are met but not both then it is unclear if the median of population A is lower than the median of population B for that alpha level. If neither of those conditions are met then the median of population A is not lower than the median of population B for that alpha level.
Null Hypothesis 1 : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis 2 : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
Null Hypothesis 3 : Median of population B <= Median of population A
Alternative Hypothesis 3 : Median of population B > Median of population A
If Null Hypotheses 1 and 2 are both rejected then null hypothesis 3 is rejected. If exactly one of Null Hypotheses 1 and 2 are rejected but not both of them then it is unclear if Null Hypothesis 3 is rejected or not. If Null Hypothesis 1 and 2 both fail to be rejected then then Null Hypothesis 3 fails to be rejected.
Assumption 6 : The P value of rejecting Null Hypothesis 1 shall be one minus the cumulative frequency distribution as calculated from left to right of achieving the median of Sample B on the probability distribution table for Sample A. In the case where no value from sample A exists with the same value as the median of Sample B this shall be estimated by using the closest value above and the closest value below the median of sample B from the probability distribution table on sample A as explained in assumption 2. In the case where a value exists on sample A which is the same as the median of sample B the cumulative frequency distribution for the value already on the chart of the cumulative frequency distribution for sample A shall be used with the frequency greater than or equal the value instead of the greater than only option
Assumption 7 : The P value of rejecting Null Hypothesis 2 shall be one minus the cumulative frequency distribution as calculated from right to left of achieving the median of Sample A on the probability distribution table for Sample B. In the case where no value from sample B exists with the same value as the median of Sample A this shall be estimated by using the closest value above and the closest value below the median of sample B from the probability distribution table on sample B as explained in assumption 3. In the case where a value exists on sample B which is the same as the median of sample A the cumulative frequency distribution for the value already on the chart of the cumulative frequency distribution for sample B shall be used with the frequency less than or equal to the value instead of the less than only option
Examples 1-4
Comparing results between t tests and custom distribution statistical hypothesis tests for uniform distributions in which both samples have equal variance and independent unpaired samples are used
Example 1
sample A = (1, 2, 3, 4, 5)
sample B = (2.5, 3.5, 4.5, 5.5, 6.5)
custom distribution statistical hypothesis test
sample A = (1, 2, 3, 4, 5), median = 3
value, frequency=value, frequency <= value, frequency < value
0.5, 0, 0, 0, 0
1, 1/5, 1/5, 0
1.5, 0, 1/5, 1/5
2, 1/5, 2/5, 1/5
2.5, 0, 2/5, 2/5
3, 1/5, 3/5, 2/5
3.5, 0, 3/5, 3/5
4, 1/5, 4/5, 3/5
median of sample B 4.5, 0, 4/5, 4/5
5, 1/5, 1, 4/5
5.5, 0, 1, 1
1-1 <= P <= 1 - 4/5
0 <= P <= 0.2 for the median of sample B being higher than the estimated median of population A
sample B = (2.5, 3.5, 4.5, 5.5, 6.5) median = 4.5
value, frequency=value, frequency => value, frequency > value
7, 0, 0, 0
6.5, 1/5, 1/5, 0
6, 0, 1/5, 1/5
5.5, 1/5, 2/5, 1/5
5, 0, 2/5, 2/5
4.5, 1/5, 3/5, 2/5
4, 0, 3/5, 3/5
3.5, 1/5, 4/5, 3/5
median of sample A 3, 0, 4/5, 4/5
2.5, 1/5, 1, 4/5
2, 0, 1, 1
1-1 <= P <= 1 - 4/5
0 <= P <= 0.2 for the median of sample A being lower than the estimated median of population B
for alpha > P reject the null hypothesis
for alpha < P fail to reject the null hypothesis
for one tailed alpha > 0.2 the median of sample A can be considered lower than the median of population B
and
for one tailed alpha > 0.2 the median of sample B can be considered higher than the median of population A
therefore
for one tailed alpha > 0.2 the median of population B can be considered greater than the median of population A
0 < one tailed alpha < 0.2 the median of sample A can not be considered conclusively if it is lower than the median of population B
and
0 < one tailed alpha < 0.2 the median of sample B can not be considered conclusively if it is higher than the median of population A
therefore
0 < one tailed alpha < 0.2 the median of population B can not be considered conclusively if it is greater than the median of population A
t - test
Null Hypothesis : mean of Population B <= mean of Population B
Alternative Hypothesis : mean of Population B > mean of Population A
sample A = (1, 2, 3, 4, 5), mean = 3, n = 5
sample variance = [ ( 5 -3 ) ^ 2 + ( 4-3 ) ^ 2 + ( 3 - 3) ^2 + ( 2 - 3 ) ^2 + (1-3) ^2 ] / ( 5 - 1 )
sample variance = ( 4 + 1 + 1 + 4 ) / 4 = 10 /4 = 2.5
sample std = sample variance ^ 0.5 = 2.5 ^ 0.5 = approx 1.58113883008
sample B = (2.5, 3.5, 4.5, 5.5, 6.5), mean = 4.5, n = 5
sample v =[ ( 6.5 -4.5 ) ^ 2 + ( 5.5-4.5 ) ^ 2 + ( 4.5 - 4.5) ^2 + ( 3.5 - 4.5 ) ^2 + (2.5-4.5) ^2 ] / ( 5 - 1 )
sample variance = ( 4 + 1 + 1 + 4 ) / 4 = 10 /4 = 2.5
sample std = sample variance ^ 0.5 = 2.5 ^ 0.5 = ( 5 / 2 ) ^ 0.5
mean sample B - mean sample A = 4.5 - 3 = 1.5
in this case Sp = sample std A = sample std B because the sample sizes and variance are exactly equal
t = ( mean sample B - mean sample A ) / ( Sp *[ (1/na) +(1/nB) ] ^ 0.5 )
1/na + 1/nB = 1/5 + 1/5 = 2/5
t = 1.5 / [ (5/2) ^ 0.5 * (2/5) ^ 0.5 ] = 1.5
degrees of freedom = df = nA + nB -2 = 5 + 5 -2 = 8
one tailed p value = .086002
https://www.socscistatistics.com/pvalues/tdistribution.aspx
one tailed p value = 0.08600
https://www.statology.org/t-score-p-value-calculator/
for one tailed alpha > 0.086 the mean of population B can be considered greater than the mean of population A
for one tailed alpha < 0.086 the mean of population B can not be considered greater than the mean of population A
Example 2
sample A = (1, 2, 3, 4, 5)
sample B = (2, 3, 4, 5, 6)
sample A = (1, 2, 3, 4, 5), median = 3
value, frequency=value, frequency <= value, frequency < value
1, 1/5, 1/5, 0
2, 1/5, 2/5, 1/5
3, 1/5, 3/5, 2/5
3.5, 0, 3/5, 3/5
4, 1/5, 4/5, 3/5
median of sample B 4, 1/5, 4/5, 3/5
4.5, 0, 4/5, 4/5
5, 1/5, 1, 4/5
5.5, 0, 1, 1
P = 1 - 4/5
P = 0.2 for the median of sample B to be greater than than estimated median of population A
custom distribution statistical hypothesis test
Null Hypothesis : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
t - test
Null Hypothesis : mean of Population B <= mean Population B
Alternative Hypothesis : mean of Population B > mean of Population A
t = 1
one tailed P = 0.173297
https://www.socscistatistics.com/tests/studentttest/default2.aspx
one tailed P = 0.17330
https://www.statology.org/t-score-p-value-calculator/
Example 3
sample A = (1, 2, 3, 4, 5, 6)
sample B = (2.5, 3.5, 4.5, 5.5, 6.5, 7.5)
custom distribution statistical hypothesis test
Null Hypothesis : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
t - test
Null Hypothesis : mean of Population B <= mean of Population B
Alternative Hypothesis : mean of Population B > mean of Population A
Example 4
sample A = (1, 2, 3, 4, 5, 6)
sample B = (2, 3, 4, 5, 6, 7)
custom distribution statistical hypothesis test
Null Hypothesis : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
t - test
Null Hypothesis : mean of Population B <= mean of Population B
Alternative Hypothesis : mean of Population B > mean of Population A
Example 5
Comparing custom distribution statistical hypothesis test with unpaired independent t test for the following triangular shaped distributions
sample A = ( 1, 2, 2, 3, 3, 3, 4, 4, 5 )
sample B = ( 2, 3, 3, 4, 4, 4, 5, 5, 6 )
custom distribution statistical hypothesis test
Null Hypothesis : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
t - test
Null Hypothesis : mean of Population B <= mean of Population B
Alternative Hypothesis : mean of Population B > mean of Population A
Example 6
custom distribution statistical hypothesis test with unequal sample sizes
sample A = ( 1, 2, 3, 4, 5, 6, 7, 8)
sample B = ( 3, 4, 5, 6, 7 )
Null Hypothesis : Median of sample B <= Median of population A
Alternative Hypothesis : Median of sample B > Median of population A
Null Hypothesis : Median of sample A => Median of population B
Alternative Hypothesis : Median of sample A < Median of population B
Copyright Carl Janssen 2022
https://en.m.wikipedia.org/wiki/Student%27s_t-test
http://web.archive.org/web/20220307082135/https://en.m.wikipedia.org/wiki/Student's_t-test
https://duckduckgo.com/?q=independent+t+test+equal+variance+and+equal+sample+size&ia=web
https://www.real-statistics.com/students-t-distribution/two-sample-t-test-equal-variances/
http://web.archive.org/web/20200804112708/https://www.real-statistics.com/students-t-distribution/two-sample-t-test-equal-variances/
https://duckduckgo.com/?q=unpaired+t+test+equal+variance+and+equal+sample+size&ia=web
https://tungmphung.com/unpaired-two-sample-t-test-independent-t-test/
http://web.archive.org/web/20210423122739/https://tungmphung.com/unpaired-two-sample-t-test-independent-t-test/
https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review
http://web.archive.org/web/20220323223702/https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review
https://duckduckgo.com/?q=standard+deviation+sample+vs+population+formula&ia=web
https://www.statology.org/population-vs-sample-standard-deviation/
http://web.archive.org/web/20210824024007/https://www.statology.org/population-vs-sample-standard-deviation/
https://duckduckgo.com/?q=test+statistic+and+critical+value+calculator&ia=web
https://duckduckgo.com/?q=p-value+and+alpha&ia=web
https://duckduckgo.com/?q=+hypothesis+one+tail+t+test+example&ia=web
https://duckduckgo.com/?q=t+test+percentile&ia=web
https://duckduckgo.com/?q=t+test+p+value+calculator&ia=web
https://www.socscistatistics.com/pvalues/tdistribution.aspx
https://web.archive.org/web/20211011165105/https://www.socscistatistics.com/pvalues/tdistribution.aspx
https://www.socscistatistics.com/tests/studentttest/default2.aspx
https://web.archive.org/web/20211011072830/https://www.socscistatistics.com/tests/studentttest/default2.aspx
https://duckduckgo.com/?q=p+value+t+test+chart&ia=web
https://www.statology.org/how-to-calculate-a-p-value-from-a-t-test-by-hand/
https://web.archive.org/web/20211123175421/https://www.statology.org/how-to-calculate-a-p-value-from-a-t-test-by-hand/
https://www.statology.org/t-score-p-value-calculator/
https://web.archive.org/web/20210306170506/https://www.statology.org/t-score-p-value-calculator/
No comments:
Post a Comment