Thursday, September 19, 2024

Probability that a value selected will be the maximum value in a uniform distribution

Copyright Carl Janssen 2024

Warning I am writing this to think about different possibilities and do not claim that the information presented is true

Title : Probability that a value selected will be the maximum value in a uniform distribution

When a sample size of n is collected with replacement from a population with a discrete uniform distribution including m possible values the probability that maximum value in the population is not sampled is

[ ( m - 1 ) / m ] ^ n

This is not the same as the probability of a value being selected that is higher than the value in the original sample with a sample size of n if the population is sampled from a second time with a sample size of 1

Example to explain why

Let us say the population has a uniform distribution of all whole numbers starting at 1 and ending at 10

Highest value in sample, missing higher numbers, Probability that a second sample with a sample size of 1 would achieve a higher result than the first value

10, no higher numbers, probability 0
9, higher numbers 10, probability 10%
8, higher numbers 8, 9 probability 20%
7, higher numbers 7, 8, 9 probability 30%
2, higher numbers 3, 4, 5, 6, 7, 8, 9, 10 probability 80%
1, higher numbers 2, 3, 4, 5, 6, 7, 8, 9, 10 probability 90%
10 - x, higher numbers 10 -x, 10 - x + 1, 10 - x + 2... 10-2 , 10-1 , 10 , probability x*10% 

Although this enables someone to estimate the probability that upon a second sample with a sample size of 1 there will be a value selected that is higher than the first sample if the highest value in the first sample is known it does not give the probability that there will be a higher value in the second sample that is greater than highest value in the first sample if the test in the first sample has not yet been done and the highest value in the first sample is not yet known.  

Such a thing would be desirable to correct problems with the empirical distribution function that mistakenly assigns a 0% probability of numbers being lower than the lowest number sampled and a 0% probability of numbers being higher than the highest number sampled which is not necessarily true.  

However such a correction would be more difficult because we do not necessarily know if the population that will be sampled by the empirical distribution function has a uniform distribution, if it is discrete or continuous or what it's range is and this method only works to estimate it when the population is discrete and has a known range, unless further things are done to modify things to take into account what is unknown beyond this starting point.

https://en.wikipedia.org/wiki/Empirical_distribution_function

Example

population only has values of whole numbers 1 through 4 and m = 4

sum of
( 4 - 4 ) / 4 * probability that highest number is 4 for sample size of n
( 4 - 3 ) / 4 * probability that highest number is 3 for sample size of n
( 4 - 2 ) / 4 * probability that highest number is 2 for sample size of n
( 4 - 1 ) / 4 * probability that highest number is 1 for sample size of n
equals
probability of a value being selected that is higher than the value in the original sample with a sample size of n if the population is sampled from a second time with a sample size of 1

Probability that the maximum value sampled from the population is not the same as the maximum value in the population

[ ( 4 - 1 ) / 4 ] ^ n = ( 3 / 4 ) ^ n

Probability that highest number is 4 for sample size of n
1 - [ ( m - 1 ) / m ] ^ n
1 - [ ( 4 - 1 ) / 4 ] ^ n
1 - ( 3 / 4 ) ^ n
1 - ( probability that a single number is 3 or lower ) ^ n

Probability that highest number is 3
1 - probability that highest number is 4 - ( probability that a single number is 2 or lower ) ^ n
1 - ( 1- ( 3 / 4 ) ^ n ) - ( 2 / 4 ) ^ n
( 3 / 4 ) ^ n - ( 2 / 4 ) ^ n

Probability that highest number is 2
1 - probability that highest number is 4 - probability that highest number is 3 - ( probability that a single number is 1 ) ^ n
( 2 / 4 ) ^ n - ( 1 / 4 ) ^ n

Probability that a single number is 1
( 1 / 4 ) ^ n

Sum of probabilities to check if they add up to 1

1 - ( 3 / 4 ) ^ n + ( 3 / 4 ) ^ n - ( 2 / 4 ) ^ n + ( 2 / 4 ) ^ n - ( 1 / 4 ) ^ n + ( 1 / 4 ) ^ n = 1


Probability that a second sample with a sample size of 1 will give a higher result than the highest value of a first sample with a sample size of n


Sum of

( 4 - 4 ) / 4 * [ 1 - ( 3 / 4 ) ^ n ]
( 4 - 3 ) / 4 * [ ( 3 / 4 ) ^ n - ( 2 / 4 ) ^ n ]
( 4 - 2 ) / 4 * [ ( 2 / 4 ) ^ n - ( 1 / 4 ) ^ n ]
( 4 - 1 ) / 4 * ( 1 / 4 ) ^ n

equals
Probability that a second sample with a sample size of 1 will give a higher result than the highest value of a first sample with a sample size of n
equals
[ 1 / 4 ] * [ ( 3 / 4 ) ^ n + ( 2 ^ 4 ) ^ n + ( 1 / 4 ) ^ n ]

I removed the exponents so this is not the same but shows the idea of how parts cancel out or technically it is the same when the exponents are 1 but they cancel out in the same way with other values for the exponents.

0+1*( 3 / 4 ) - 1*(2/4)+2*(2/4)-2*(1/4)+3*(1/4)=3/4+2/4+1/4

when n = 1 this is at it's highest for all permitted n values as n => 1

3+2+1 = 6
6 / 4 = 1.5
0 < 1.5 / 4 = 3 / 8 < 0.5 < 1 

if it was greater than 0.5 it would be a problem because we have to consider symmetry of testing for how often a sample size of 1 in the second sample is lower than the lowest value of the first sample

Checking results
Sum of products
Value, Frequency, probability of next sample being higher, product
1, 1/4, 3/4, 3/16
2, 1/4, 2/4, 2/16
3, 1/4, 1/4, 1/16
4, 1/4, 0, 0
Equals
6 / 16 = 3 / 8

Final Answer

From i = 1 to i = m - 1 where i changes by 1
take the sum of
[ 1 / m ] * ( i / m ) ^ n

Check to make sure this never exceeds 1 / 2 for m  => 2 it is at max when n = 1 for n => 1
when n = 1
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )
equals
[ ( m - 1 ) * m / 2 ] / [ m * m ] = ( m - 1 ) / ( 2 m ) = 0.5 - ( 1 / 2 m ) < 0.5
approaches 0.5 from below as m approaches positive infinity

This value of approaching one half from below makes sense because it should be split into two parts one above the sample value and the other below the sample value except for a very narrow third part of the single sample value and on average there is symmetry and the portion excluded gets smaller and smaller as sample size gets larger and larger


1 + 2= 3 = [ 2 * 2 + 2 ] / 2 = ( 2 * 3 ) / 2
1 + 2 + 3 = 6 = [ 3 *3 + 3 ] / 2 = ( 3 * 4 ) / 2
1 + 2 + 3 + 4 = 10 = [ 4 * 4 + 4 ] / 2 = ( 4 * 5 ) / 2

For other values of n and m

when n = 1
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n














when n = 2
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n


https://calculator-online.net/riemann-sum-calculator/









when n = 3
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n


when n = 3
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n


when n = 4
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n


when n = 9
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n


when n = 19
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n




when n = 24
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n



when n = 49
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n



when n = 99
From i = 1 to i = m - 1 where i changes by 1
take the sum of 
[ 1 / m ] * ( i / m )^n






Computer estimated results very close to 1 / ( n + 1 ) for m = 250

Right and Midpoint Riemann sum overestimated for n = 1 so used left Riemann sums primarily as defined by however the computer system did it

when n = 2 as m increased the calculated result increased and got closer to 1 / ( n +1 ) from below

This was close to previous assumption by dividing sample values into sections that form the gaps between nearest neighbors and assuming equal probability of falling into each gap range and having either n-1 or n+1 gaps depending on if you consider the area above the highest selected value and below the lowest selected value to be a gap or not.  Although I guessed that I did not have a formal mathematical way to show my guess was reasonable before doing this.

Only tested for m up to 250 as it seemed to slow down or not work well for m of 1000 the one time I tried it
250 is nearest factor of 1000 that is below 256 or 255 so it was chosen

Monday, September 16, 2024

Piecewise Probability Distribution functions from sample data

Copyright Carl Janssen 2024

Warning : I am kind of making stuff up to think about possibilities of how stuff works do not use this on statistics tests or homework at a accredited university

This is out of order because I wrote things to do later in the front then I started doing them and I also wrote about observations from Wikipedia and other sources before the original introduction and material I was doing earlier and I have a second group of things to do at the end.  I might later simply do stuff in a shorter version.

In order to solve the problem of repeat values replace the repeat values with two values right next to it 

For example 2, 2 could be replaced with 2,01 and 1.99 as long as all other values are greater than 2.01 and also less than 1.99 by a large amount like maybe 100  but better replace it with the limit of 2-x as x approaches infinity and 2+x as x approaches infinity.  Another possibility is if a uniform distribution function is used to fill the space in between values then the uniform distribution functions to the left and right of the function are multiplied by the number of times that value occurs if the value 2 occurs twice instead of once then it is multiplied by 2 each time this is done it can be counted as another gap between values, each gap between values is presumed to have an equal probability of falling anywhere within the range of that gap of equal to one divided by the number of gaps between values if the gaps beyond the maximum and minimum numbers observed that I call tails in this context are ignored or presumed to have a zero probability.  There are n-1 gaps between n values if there are no repeat values and the spots I call tails are ignored but if the spots I call tails are counted as gaps then there are n+1 gaps.  The probability of falling on a specific gap that is not a tails might be ( 1 - probability of falling on tails ) / n - 1

To do

Explain possible models to make it more continuous

Fill in remaining space with uniform distribution probability density functions values except at the two tail ends such that the probability of achieving a value between any two values that were found in the sample is modeled as being the same but the cumulative probability density function looks continuous instead of like steps, this should result in the same cumulative probability values for values that match the sample values but different values for the values that are in between that would be predicted by the probability density function

Emphasize that remaining space that was filled with uniform distribution probability density functions could also be filled in with other functions as long as this results in a cumulative probability density function that is the same as it would at the values that were in the sample when the function was treated like that there were steps instead of it being continuous


Look into setting the cumulative probability for variation with two symmetric tails as comparing two options with no replicated values and a sample size of n

Option 1

There are n + 1 sections each with a probability value of 1/ ( n + 1 ) of landing in that section smooth out with uniform distribution probability density functions to make data continuous so that there are no discontinuities in the cumulative probability density function where values that were in the sample occurred anymore

Such that the cumulative probability ( as calculated from left to right ) of the value corresponding to index i equals i / ( n + 1 )

This still does not resolve the probability distribution function at the tails but let's you know the cumulative probability of landing anywhere on the unlimited range in one direction of the left tail as 1 / n +1, the left tail can not have a uniform probability distribution function if it goes on forever in one direction

Option 2

Set the value of the cumulative probability density function for the value corresponding to index i as the number of points to the left of it plus one divided by the number of points to the left of it plus 1 the number of points to the right of it plus 1.  The plus 1 sometimes represents including the probability of landing on a tail.  

i -1 points are to the left of it

n - i points are to the right of it

( i - 1 + 1 ) / ( i - 1 + 1 + n - i + 1 ) = i / ( n + 1 )

Done Option 2 results in the same as option 1 when tails are counted when sample values are not replicated


Compare possibilities of dealing with replication of values within sample as to how different options would effect the model

Definitions of variables for option 1 and option 2 when replicated values occur

there are n different value in the sample with m total values replicated

How you count replicated is important and there are different conventions to count it, and I choose the following convention because it allows me to to get the sample size easily by adding m + n

if a value occurs twice it is only replicated once

if a value is replicated it counts toward the count of n different values but is only counted one time

q equals the number of times a value occurs

if a value occurs q times then M = Q - 1 and Q = M + 1

the total number of values taken in the sample is M + n

2, 2, 2, 3, 4, 5 M=2 and n = 4 and M + n = 6 and Q = 3

values are ordered by index from lowest to highest

when identical values occur there are not multiple indexes but the number of times they occur for that index is q ( i ) and the number of times they are replicated is m ( i ) 

the index i = 1 represents the lowest value

the index i = n represents the highest value

M = sum of all m(i)

Q = sum of all q(i)

m(i) = number of times the value at that index is replicated in the sample = q(i) - 1

q(i) = number of times the value at the index 

c(i) = cumulative probability of i

c ( 1 ) = c ( i = 1 )

Option 1

for the value corresponding to index 1 the cumulative probability is

probability of landing on one tail is 1 / ( 1 + M + n )

c (1) = [ 1+m(1) ] / ( 1 + M + n )

c ( M + n ) = 1 - [ 1+m( M + n) ] / [ 1 + M + n ] if this is not true then calculation from right to left is not the same as from left to right and I should reject using option 1

for i > 1

c ( i ) = c ( i - 1) +  [ 1+m( i ) ] / ( 1 + M + n )

for i < M + n

c ( i + 1 ) = c ( i ) +  [ 1+m( i + 1) ] / ( 1 + M + n )

Compare option 1 calculating cumulative probability density from left to right as opposed to from right to left the left to right value + the right to left value for each index should add up to 1 otherwise this method should be abandoned the above calculations are only for left to right and not from right to left

Option 2 

c(i) = 1 + the number of things to the left of it divided by 1 + the number of things to the right of it + 1 + the number of things to the left of it

Probability of landing anywhere in entire range of all of left tail value = 1 / ( 1 + M + n )

c ( 1 ) = 1 / ( 1 + M + n - 1 - m (1) )

c ( 1 ) = 1 / ( M + n - m (1) )

c (M+n) = 1 /  ( 1 + M + n - 1 - m (M+n) )

c (M+n) = 1 /  ( M + n - m (M+n) )

for i < M + n

c ( i + 1 ) = c ( i ) + 1 / ( 1 + M + n - 1 - m ( i + 1 ) )

c ( i + 1 ) = c ( i ) + 1 / ( M + n - m ( i + 1 ) )

for i > 1

c ( i  ) = c ( i - 1 ) + 1 / ( 1 + M + n - 1 - m ( i  ) )

c ( i  ) = c ( i - 1 ) + 1 / ( M + n - m ( i  ) )

Check to make sure probability does not add up to more than 1.  If probability does add up to more than 1 then see if it adds up to less than or equal to 1 when tail value is set to 0.  If it adds up to less than 1 then change tail values to add up to what is left over so that it adds up to 1.  Changing tail value will change the cumulative probability distribution of the first point on the index when calculating from left to right and the cumulative probability values of the last point on the index when calculating from right to left.  Cumulative probability from left to right + cumulative probability from right to left equals 1. 

Make sure there are no contradictions when calculating from left to right as opposed to right from left

Right tail number + Left Tail number can be set and added to the denominator of every cumulative probability distribution for the values found in the sample but only left tail number will be added to the numerator if cumulative probability distribution is calculated from left to right

If the cumulative probability for everything adds up to more than 1 then the entire cumulative probability distribution function can be multiplied by a constant to fix this problem so that the probability is 1


Estimating tail frequency as a function of sample size for sampling from a known random uniform distribution that is pretended to be unknown and modeled with the piecewise method

The assumption has been made for one of the models that the frequency greater than the highest value measured in a sample plus the frequency greater than the lowest value measure in a sample equal two divided by the sample size plus one

If four data points are taken as a sample then a model is constructed the theoretical estimation is that 2 out of 5 points will be higher or lower than the minimum and maximum from the previous sample if a second sample is taken with a sample size of a multiple of 5

If nine data points are taken as a sample then a model is constructed the theoretical estimation is that 2 out of 10 points will be higher or lower than the minimum and maximum from the previous sample if a second sample is taken with a sample size of a multiple of 10

If 24 data points are taken as a sample then a model is constructed the theoretical estimation is that 2 out of 25 points will be higher or lower than the minimum and maximum from the previous sample if a second sample is taken with a sample size of a multiple of 25

This can be repeatedly tested with a uniform distribution and an average can be taken of the frequency of how many data points exceed the highest or lowest value in the previous sample with a given sample size

We can see if this prediction is accurate or if a constant times this prediction is accurate

We can then use that constant to decide what the probability of landing on the left tail + the probability of landing on the right tail is for the piecewise distribution

Tails Definition in this context

Where left numbers are treated as lower than right numbers on a numberline

I am defining the left tail as the region of values to the left of the lowest value from the sample used to decide the piecewise distribution

I am defining the right tail as the region of values to the right of the highest value from the sample used to decide the piecewise distribution



Statistical significance testing theories

Calculate the probability that the mean or median of A would be different than the mean or median of B by at least as much as the values were different in the sample

By calculate I mean give a demonstration on how to calculate that.  It would probably be good to make sure no numbers overlap between A and B to avoid issues with the discontinuities when doing the demonstration unless the space is filled in with uniform distribution functions

Explain how to get a lower and upper estimate for the probabillity with the tails, the lower estimate being 0 and the upper estimate arbitrarily being 1 / ( sample size + 1 ) and demonstrate another test where results are beyond the tails

1 Calculate probability that a value with a sample size of 1 taken from distribution B would be as far from the median of B as the median of A is from the median of B

2 Calculate probability that a value with a sample size of 1 taken from distribution A would be as far from the median of A as the median of B is from the median of A

3 Calculate probability that a value with a sample size of 1 taken from distribution B would be as far from the mean of B as the mean of A is from the mean of B

4 Calculate probability that a value with a sample size of 1 taken from distribution A would be as far from the median of A as the mean of B is from the mean of A

Would the probability be raised to the power of the sample size for a sample size greater than 1 more detail on that question later?

If a Statistical Significance test is testing the chance that they would or would not really come from the same distribution consider the consequences to statistical significant testing outcomes if you compared the probability of results compared to a merged distribution.  Consider merging the two data sets A and B into one set C then seeing the probability of the following four things.  

1 that the median of A is different than the median of C by at least as much as the measured difference between the two

2 that the mean of A is different than the mean of C by at least as much as the measured difference between the two

3 that the median of B is different than the median of C by at least as much as the measured difference between the two

4 that the mean of B is different than the mean of C by at least as much as the measured difference between the two

Is the probability of difference in that direction put to the power of the sample size?

Is the probability of items 1, 2, 3 and 4 above for the sample size of A or B equal to the probability that it would be so if the sample size was 1 to the power of the actual sample size.  For example if the probability of such a difference in that direction was 0.2 if the sample size was 1 and the sample size was 5 would the probability actually be 0.2^0.5 because it is the probability of that happening 5 consecutive times instead of 1 time or is the probability with  a sample size of 5 the same at 0.2 as it would be with a sample size of 1 or something else?

If in order for A and C to have a difference in mean of 2 or more for a sample size of 4 for A you would not need to be apart from the mean of C for the sample values of A by 2 a total of 2 times but would only need to be apart by an average of 2 among all the values this could be achieved by ( 2, 2 ) but also by at least ( 2-x, 2+x )  for a difference between the values of A and the mean of C however if the probability of being off by 2 + x divided by the probability of being off by 2 is less than the probability of being off by 2 divided by the probability of being off by 2 - x and the greater the value of x the less likely you are to be off by (2+x, 2-x) then treating it as the probability of being off by by at least (2, 2) would over estimate how likely you are to reject the null hypothesis compared to being off by (2+x, 2-x) so you might use that to make the argument that the probability of being off by at least 2 a total of 2 times in a row is close enough to achieving the same result but it is not because having an option of (2,2) or (2+x, 2-x) gives more possibilities and therefore a greater chance of failure to reject the null hypothesis

Article Title : Getting Piecewise Probability Distribution functions from sample data

Possible name for distribution I thought of before hearing about the empirical distribution: Percentile based piecewise probability distribution functions (PBpdfs)

Finding out about the empirical distribution after starting writing this article

After writing an earlier draft I found this article wikipedia on the "empirical distribution function" and it seems similar to what I already wrote about the variation of the model involving "assuming a distribution with no tails and zero probability outside max and min collected values" writing but there are some differences in the very least in that I wrote another model or a model variation to consider the possibility of data beyond the min and max of the sample data

Empirical distribution function wikipedia

https://en.wikipedia.org/wiki/Empirical_distribution_function

Naming

Upon this new information about naming I will call the types of distribution models I am looking at "piecewise distributions" and the type mentioned on wikipedia an "empirical distribution"

All empirical distributions are piecewise distributions but not all piecewise distributions are empirical distributions

If I find that the term piecewise distribution is used to mean something else then I will have to change the name.  Or if I find the definition of empirical distribution is broader than I assumed such that it overlaps completely with or includes things that fall under other categories than what I call a piecewise distribution then I will have to change that claim

What do I mean by a piecewise distribution.  A piecewise distribution is a distribution whose probability density function is made with piecewise functions as opposed to a distribution that is only made with one function that is not a piecewise function.

Here is a Wikipedia article on Piecewise functions

https://en.wikipedia.org/wiki/Piecewise_function

Desire to find out if there is a statistical significance test for the empirical distribution

I would like to find out if a method of "statistical significance testing" that presupposes a "Empirical distribution function" has been designed already now that I know that I know that there is a name for a distribution that is similar to the type of distribution I am trying to create a model for, although as I already mentioned there are some differences.  I would suggest that there are many things you can do with statistics other than statistical significant testing which generally are more useful to society and even if there is no premade statistical significance test for this distribution it is still useful to think about it and similar distributions for other purposes.  If there is no premade statistical significance test for such a distribution then I would like to figure out how to make one if such a thing is possible.  I do not think it is good to pre-assume a normal distribution when you can get a better matched distribution through one that is designed to match the probability of each point of the actual raw data.

Possibility of using a confidence interval for the empirical distribution or piecewise distribution to do a statistical significance test

"Confidence Intervals and Statistical Significance

If you want to determine whether your hypothesis test results are statistically significant, you can use either P-values with significance levels or confidence intervals. These two approaches always agree."

https://statisticsbyjim.com/hypothesis-testing/hypothesis-tests-confidence-intervals-levels/

What a confidence interval allegedly does not mean

"A 95% confidence level does not mean that 95% of the sample data lie within the confidence interval.

A 95% confidence level does not mean that there is a 95% probability of the parameter estimate from a repeat of the experiment falling within the confidence interval computed from a given experiment."

https://en.wikipedia.org/wiki/Confidence_interval

And another website with an example contradicting the Wikipedia claim about what the confidence interval allegedly does not mean

The "±" means "plus or minus", so 175cm ± 6.2cm means

175cm − 6.2cm = 168.8cm to 

175cm + 6.2cm = 181.2cm

And our result says the true mean of ALL men (if we could measure all their heights) is likely to be between 168.8cm and 181.2cm

But it might not be!

The "95%" says that 95% of experiments like we just did will include the true mean, but 5% won't.

So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true mean.

https://www.mathsisfun.com/data/confidence-interval.html

Interpretation of a Confidence Interval

In most general terms, for a 95% CI, we say “we are 95% confident that the true population parameter is between the lower and upper calculated values”.

A 95% CI for a population parameter DOES NOT mean that the interval has a probability of 0.95 that the true value of the parameter falls in the interval.

The CI either contains the parameter or it does not contain it.

The probability is associated with the process that generated the interval. And if we repeat this process many times, 95% of all intervals should in fact contain the true value of the parameter.

https://online.stat.psu.edu/stat504/lesson/confidence-intervals

Once data is collected the probability that the mean lies within the confidence interval is not 1 - alpha or 100% - alpha but either 100% or 0%.  The confidence interval is determined by the confidence coefficient, the mean and the standard deviation when using z-scores.

Interpreting the Confidence Coefficient

Interpreting the confidence coefficient requires a nuanced understanding of its implications. A common misconception is that a 95% confidence coefficient means there is a 95% probability that the true parameter lies within the interval. Instead, it reflects the long-term performance of the estimation process. In repeated sampling, 95% of the intervals constructed would contain the true parameter, but for any single interval, the true parameter either lies within it or it does not. This distinction is crucial for accurate  statistical interpretation.

https://statisticseasily.com/glossario/what-is-confidence-coefficient-explained-in-detail/

The CI either contains the parameter or it does not contain it.

https://online.stat.psu.edu/stat504/lesson/confidence-intervals

Really suspicious claims about confidence intervals for empirical distribution on wikipedia that are either wrong or non conventional in terms of their units and that are wrong in that they do not scale correctly if they are wrong about their units and the conventional units are used

Calculations are presented involving confidence intervals for the empirical distribution function in these two Wikipedia articles but the calculations make no sense because they are unitless and dimensionless and only dependent on the sample size and the alpha values as far as I understand.

This would mean that if all the values were multiplied by a constant the confidence interval would be the same, it also means that the confidence interval would not change if the values change.  This seems to me to mean that if all the sample values were multiplied by a unit less real number constant with an absolute value less than 1 and the alpha value and the sample size were kept the same then the chance of rejecting the null hypothesis would decrease and if all the values were multiplied by a unit less real number constant with an absolute value greater than 1 or less than -1 and the alpha value and sample size were kept the same then the chance of rejecting the null hypothesis would increase

The confidence interval might have been done in a different way then I am normally used to involving a probability on the graph instead of involving a number with the same unit as the values measured.  That is normally if probability was graphed vertically and the value that is assigned a probability is graphed horizontally then the confidence interval would normally be graphed horizontally but I thought they mentioned it in such a way that it would be graphed vertically.  Let's say you measured the length of logs you might plot the length of the logs on the x axis horizontally and the frequency that logs are that length vertically on the y axis, now if you wanted to assume a normal distribution represented and plot that distribution you might again plot the probability on the y axis vertically and the theoretical length of the logs that corresponds with that probability on the x axis.  In such a case normally the confidential interval would have units of length and not unitless or dimensionless units of probability and be drawn horizontally on the graph not vertically.

Illustration of the Kolmogorov–Smirnov statistic. The red line is a model CDF, the blue line is an empirical CDF, and the black arrow is the KS statistic.

https://en.wikipedia.org/wiki/File:KS2_Example.png

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test

"The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples"

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

https://en.wikipedia.org/wiki/Dvoretzky%E2%80%93Kiefer%E2%80%93Wolfowitz_inequality



Introduction

I am not sure if piecewise means something special in statistics, I made up this term for what I want to do based on the concept of constructing piecewise linear functions to fit data in algebra, although this might be somewhat different in some ways but similar in other ways as how to get such a piecewise function than working with coordinates in algebra.

Let's say you have gathered a sample of N data points from a population.  It is a common practice to assume without any proof that the data is normally distributed and calculate a mean and standard deviation and then use that for statistical significance tests.  But I would suggest that it might not be normally distributed and instead you could create a custom or piecewise distribution that might better represent the actual distribution of the frequencies of values or frequencies of ranges of values within a population.

You could simply take each of the values and count how many times that value shows up in the sample and divide that by N or the number of data points that you collected to get that sample data from the population.  I would suggest that might not be the best practice because it assumes that only those values exist within the population and no value greater than the highest value collected exists within the population nor do values lower than the lowest value collected exist nor do values in between the values you collected exist if they are not the same values as values that you collected because a 0 probability is assigned to all values that you did not collect in the sample.  Such values that would be assigned a 0 probability this way might actually exist within the population that the sample was collected from.

People might debate about several other ways to create such a piecewise distribution or whether or not a piecewise distribition would be more accurate than a normal distribution or some other distribution but here are some ways a probability distribution could be assigned on a piecewise basis.  I will assume that the percentile of a value that shows up within sample data is related to the commutative distribution that it occurs but the assumed value will not necessarily the same as the cumulative distribution as I will suggest caveats that modify the guessed cumulative distribution to be slightly different than the percentile.  First there are inconsistencies in how a percentile is used based on what is done when calculating a percentile when a value is exactly equal to a value on the list.  Second a problem with using percentiles is how do you rank the percentile for a number with a value that exceeds or is less than all the numbers on a list for percentiles, would it be realistic to assume that such values can never occur, if this was the case then this is problematic unless all samples used to calculate percentiles always had world record holders.  Third two different numbers in between the same two different closest neighbor values used to calculate percentiles should have a different cumulative probability of occurring I plan to address various ways to deal with this by filling in functions or to somewhat ignore it by estimating the probability to be within a specific range of values but not specifying the probability of one value within that range compared with another.

I also plan to show an example of one way you could test which model fits a random sample of data better a normal distribution or a piecewise distribituon using an example with a presumed uniform random sample of data from random dot org random.org which might of course give different results than other ways you could test this which I will not specify.

Percentiles and Discontinuities

Let's say you have a list of values and you want to take a single value that might or not be the same as a value on the list and you want to say what fraction of numbers are equal to, greater, then less than that number

Let us take the example of the list 1, 2, 3, 4

All numbers less than 1 are less than 100% of the numbers on that list

1 is equal to 1 out of 4 numbers on that list and less than 3 out of 4 numbers on that list and greater than 0 out of 4 numbers on that list, it is less than 3 out of 3 numbers that are not equal to it and greater than 0 out of 3 numbers that are not equal to it 

numbers greater than 1 and less than 2 are equal to 0 out of 4 numbers on that list and less than 2 out of 4 numbers on that list and greater than 3 out of 4 numbers on that list

2 is equal to 1 out of 4 numbers on that list and less than 2 out of 4 numbers on that list and greater than 1 out of 4 numbers on that list, it is less than 2 out of 3 numbers that are not equal to it and greater than 1 out of 3 numbers that are not equal to it 



Data values will be ordered from lowest to highest with the assumption of no repeat values for now

i = 1 the index for the lowest value collected in the sample

I = 2 the index for the second lowest value collected in the sample

i = N -1 the index for the second highest value collected in the sample

i = N the highest value collected in the sample

x ( i ) = the value collected in the sample that corresponds to the index of i

X = a random value from a different sample collected from the same population that x values were collected from 


f ( X ) = the assumed probability of getting a value of X  might not use this text line

F ( X ) = the assumed probability of getting a value of less than or equal to X might not use this text line


There might or might not be a point of discontinuity for the probability at exactly each of the values collected for some of these methods and < and > will be used instead of <= and >= for now and continuous data will be assumed for now

Explanations will be made for how to do so if no duplicate values were collected in the sample for now

Probability refers to the probability value that will be assigned for the piecewise Probability distribution function

Only scalar real numbers will be used for this for now


Assuming a distribution with no tails and zero probability outside max and min collected values

P( X => x ( 1 ) ) = 1

P( X< x ( 1 ) ) = 0

P( X > x ( N ) ) = 0

P( X<= x ( N ) ) = 1

P ( x ( i-1 ) < X < x ( i ) = 1 / ( N - 1 ) for i => 2 and i <= N

P ( x ( i ) < X < x ( i + 1 ) = 1 / ( N - 1 ) for i => 1 and i => N + 1


Assuming a distribution with two Symmetric Tails of unlimited length

P( X => x ( 1 ) ) = 1 - ( 1 / ( N + 1 ) )

P( X< x ( 1 ) ) = 1 / ( N + 1 )

P( X > x ( N ) ) = 1 / ( N + 1 )

P( X<= x ( N ) ) = 1 - ( 1 / ( N + 1 ) )

P ( x ( i-1 ) < X < x ( i ) = 1 / ( N + 1 ) for i => 2 and i <= N

P ( x ( i ) < X < x ( i + 1 ) = 1 / ( N + 1 ) for i => 1 and i => N + 1


https://en.wikipedia.org/wiki/Cumulative_distribution_function


Testing piecewise compared with normal distribution

Generate random uniform distribution data with between 0 and 100 with N data points excluding repeat values

Replace excluded data points with new random data points

Create probability piecewise probability distribution function based on that data

Calculate normal distribution curve with mean and standard deviation based on that data

Create N + 1 data point sections that fit into boxes that each have a 1 / ( N + 1 ) probability of X landing somewhere inside each box the two end boxes going on towards positive and negative infinity or having no end limits on one end but having a limit on the other end

Create such data point section boxes for normal distribution curve and also for the piecewise distribution

The boxes for the normal distribution will look symmetric but not necessarily the ones for the piecewise distribution 

Generate random uniform distribution data with between 0 and 100 with N + 1 data points excluding repeat values and excluding values identical to the previous generated data points or that fit perfectly the box boundaries so that you do not know which of the two neighboring boxes they fit in

Replace excluded data points with new random data points

Each box should have exactly 1 data point inside it

A box with 0 data points counts as 1 error

A box with more than 1 data point in it counts as 1 error for every extra data point it has inside it beyond 1

See if the normal distribution or the piecewise distribution results in more errors or if they both result in the same number of errors

why I am against statistical significance tests

 Copyright Carl Janssen 2024 September 16

why I am against statistical significance tests

Presenting an informed public with raw data versus using statistical significance without raw data to dupe the public or reasons why I am against statistical significance tests as a over glorified and often more harmful than beneficial standard in academia

This article is an incomplete description of my reasons as there are more.  For example some reasons involving questioning the very idea of there being random events with a probability is not addressed

Although some of the ideas in probability based statistics might have advanced the cause of science more than it harmed it, I suspect that the idea of statistical significance tests have done more to harm the cause of science then to advance it or at the very least using it as a common standard in scientific journal articles and using it as a assumed default research method whenever anything involving measured values is done in the biological and social sciences has done more harm than good.

A common practice in peer reviewed scientific journal articles is to assume data is normally distributed where whether or not data is normally distributed is often not tested, then to arbitrarily assign an alpha values for a statistical significance test, which often is a t-test even though many other types of statistical significance tests are available which could have been used instead and might have made more sense if the data was not normally distributed and or was not quantitative or interval data but was treated like it was.  The public is often then told whether or not the test results are statistically significant based on that t-test in media publications if this is politically expedient to meet their goals but if it is not politically expedient the results might be less likely to be mentioned.  When the data is published the raw data is often excluded and only the mean, sample size and standard deviation are typically presented as a summary which prevents alternative ways to analyze the data mathematically or statistically unnecessarily hindering the advancement of science through suppressing access to data that funding and research already went to.  I would suggest in return for this data suppression the most appropriate response of the public would be too call into legitimacy all conclusions involving statistical significance tests for which all raw data except that which is necessary to protect subject or patient confidentiality is not published.

I believe that if probability based statistics is to be used for public good it would be more useful to simply collect sample data from a population then try to use it to make predictions about the frequency of different outputs or ranges of outputs in different conditions to meet ones goals.  These goals might be different in different circumstances, so I believe that taking raw data and then assigning it a presumed distribution such as a normal distribution with a given mean and standard deviation which is published in a peer reviewed journal but then hiding the raw data from the public is a disservice to science compared with simply giving the raw data and letting the public decide what they want to do with it based on their goals and what type of distribution they think it has.  Of course one might object to publishing the raw data because it violates patient or subject confidentiality in the biological and social sciences.  Will one should then ask does publishing the mean and standard deviation violate confidentiality, to some degree it could but not so much if you remove the subject names.  But I would suggest you can simply publish the raw data from which the mean and standard deviation is calculated from then likewise remove the subject's names and any other personal information through which they might be identified.

Although I do not like the idea of doing statistical significance tests at all, if they must be done then I would suggest only publishing the one and two P-values which would let the public know what alpha values would have worked to achieve one or two tailed significance or not instead of arbitrarily assigning a alpha value and then telling the public whether or not the study was statistically significant based on that arbitrarily preassigned choice of an alpha value. The problem with assigning a alpha values and then telling the public that something is or is not statistically significant is it is extremely misleading to the public when it comes to applications because something might be said to have made no difference which would have made a difference if a different alpha value was assigned or made a difference when it would have made no difference if a different alpha values was assigned.  It also leads to a problem in that journal's are more likely to choose to publish research if a statistical significance has been achieved, so sometimes researchers will repeatedly do the same experiment and then only publish it when a statistical significance has been achieved for a arbitrarily assigned alpha value that might fit the journal's goals for what is acceptable.  This results in extremely biased research that looks like the statistical significance for a certain alpha value is achieved for a certain type of experiment more often than it would be if the times that statistical significance was rejected were also published.  The journal's goals for what alpha values should be used do not necessarily the public's goal which is varied depending on what the individual within the public wants to achieve under what circumstances.

The alpha values in statistical significance tests are arbitrary, if the research is done "blind" then whatever value the researcher assigns should not effect what data was collected, someone could just as well have assigned another value and the data would have changed from significant to not significant according to that alpha value.  I would suggest that if the scientific journal articles author's want to decide that they are going to treat the data like it is a normal distribution for some part of their article's analysis of the data that is fine but they should still publish the raw data and let the public decide whether or not it is a normal distribution.  But as for the alpha value I believe the public would be better off if no alpha values were used at all to decide if the data is statistically significant or not.  Instead I would suggest people would be better off if they gave a two tail P value and an additional one tail P value in whatever direction the one tail T test would succeed in or perhaps two one tail P values one in each direction.  The public would then know what alpha values would have resulted in statistical significance for a two tail t tests or a one tail t test in whatever direction if that direction were assigned.  Depending on the one or two tail alpha values they would desire as the criteria necessary to meet their specific goals they want to accomplish they could then make the decision for what to do based on those one and two tail P values.

How would the public decide what they want to use based on goals.  The closer the alpha value is to zero the less likely they are to accidentally reject the null hypothesis when they "should have" accepted it or failed to reject it.  The farther the alpha value is away from zero the more likely they are to accidentally reject the null hypothesis when they "should have" accepted it or failed to reject it.  There is no perfect alpha values that is identical for all individuals with all goals in all situations, since changing the alpha value does not reduce the chance of making an error but only reduces the chance of making one type of error in exchange for increasing the chance of making another type of error.  In one situation for one individual with a certain goal of avoiding one type of error might be more important than avoiding another type of error and they should choose the alpha value according to their specific goal in that specific situation if alpha values are actually ever applied to anything in real life.

But are alpha values ever actually applied to any honest goal in real life?  I would suggest no.  I would suggest that none of the uses of alpha values encourage a application in a real life situation in meeting a goal other than persuading people.  And that the goal's with alpha values in real life situations that do involve persuading people are never the type of persuasion that is done in a ethical manner that does not involve undue influence.

What are some of these reasons

1 To persuade a journal to publish something not to advance the cause of science but to accumulate more publications for a career goal.  I am not saying that career goals are bad as career goals can be good or bad based on the motive and the results, but I would suggest that both the motives and the results of this career goal are bad because they are misleading the public in exchange for money.

2 To trick someone into doing something based on something being or not being significant according to a journal.

3 To simply make it through an assignment that you have been unduly influenced into doing so that you can prove you know how to do statistical significance tests without thinking about the actual science of things.

4 To boost your ego in a bad way and feel like you have objectively proven something they predicted in advance that is not so clearly and unambiguously proven at all because if it was clear and unambiguous they would not need a statistical significance test to prove it in the first place because a model would exist in which a specific output can be predicted for a specific input using some sort of combination of algebra, trigonometry and calculus equations with no statistical probability theory invoked at all.  Someone might for instance insist they need to assign a one or two tail test and an alpha value before doing the experiment to "eliminate bias' so they can say they called it correctly in advance without bias and boost their ego in a bad way.  I would suggest if they were not invested in their ego in a bad way then they would be comfortable with not needing to say they "called it" or "predicted it" correctly but simply publish the data with the P values but not alpha value as I already decided and let the public come to their own conclusions.  However I would suggest that even publishing what the P values for what a certain type of statistical significance test would be are not necessary because statistical significance tests are not really used for real honest applications and if the scientists really let go of their ego in a good way they would simply publish the raw data of the experiment and let the public do whatever they want with that raw data.

Ok but if you are not going to do statistical significance tests but you are going to claim that maybe using probability based statistics with raw data is good for something then what would the public do.  Let's say there is an experiment and there is data group B which has experiment variation B done to it and data for group A which has experiment variation A done to it.  The public should simply choose the real life application closer to the process in either A or B by counting the number of data values that are closer to the results they want divided by the number of data values and then choose A or B based on that for the application.  This could be for example choosing which one generates a higher percent within a data range or has a higher or lower mean, median, mode or some other function result.  Normally the public wants to get a certain type of values for the results when doing a certain type of action and they should simply choose whether A or B would get those type of results more frequently.  This type of application can often be done with the raw data without knowing whether or not the data has a normal distribution.

For example a percentile chart can be made without ever figuring out what type of distribution something is.  Someone can simply list the raw data and then look at how many data points are above and how many are below a data point to estimate the percentile without knowing what type of distribution the data is.  But instead some people have made this needless complicated and I would propose have not increased the accuracy of the estimation in doing so but decreased it in most cases.  First they calculate the mean and the standard deviation then they hide the raw data.  Next they tell someone how to estimate what percentile they are based on how many standard deviations they are away from the mean, which may result in a different result then estimating the percentile by counting how many points are above and how many points are below that point.

For example let's say you are a shoe salesman and you want to sell the frequency of shoes of a certain size based on the frequency that the public has those shoe sizes.  You could simply count what percent of people in a sample have each shoe size based on the raw data.  Why would you waste your time calculating the mean and then the standard deviation and then hiding the raw data from yourself and using the mean and standard deviation to estimate frequencies for each shoe size that might be less accurate then just using the raw data to count frequencies.

Let's say your are a cardiologist and you have to decide which medicine and what dose to prescribe for a patient and each medicine and dose increases or decreases the blood pressure by a certain amount whether it is as a percentage or as a absolute quantity.  Let's say you want to increase it or decrease the blood pressure by no more than one amount but no less than another amount.  You could simply look at a list of Group A with medicine A at dose A and Group B with medicine B at dose B.  You could then simply count which Group has a higher percent of the listed values that meet the criteria of not modifying the blood pressure too much or too little in whichever direction you want and then picking that medicine.  To do this actual application does not require figuring out whether or not the data is normally distributed and getting a mean and a standard deviation and it certainly does not require removing the raw data so that you can not see it.

Now someone might object that just using the data is problematic because maybe you do not have enough data points and you need to find out if you have enough data points that you can be certain enough of whatever.  And I would simply say I am not objecting to getting more data points.  But you simply have the data points of whatever has been collected and you still have to make a decision.  Sometimes you have to make a decision with what limited data you have and do not have the time or other resources to collect additional data.  And no matter how much or how little data you have using this method is going to be better than burying your head in the sand because you do not have enough data to get a statistically significant result that you feel is powerful enough for an artificially assigned alpha.  

More over I would suggest with this method you can get more data.  If all the raw data was published for past experiments then someone could simply merge the data from replicated past experiments to get a list of data points to use this method instead of using the statistical significance test methods to remove replications of studies that are not statistically significant as historically has been the common practice of many scientific journal articles.  Merging raw data of replications of experiments would result in having enough data points that not enough data points would not be as much of a problem.  On the other hand hiding the raw data and also hiding experiment replications that were not statistically significant would increase the problem of not enough data points to be certain enough.

How could data be merged.  You would not change the data from old experiments.  But let's say there is experiment 1 and experiment 2 and so on and each experiment has data for group A with treatment A and data with group B with treatment B.  So group 1A would be data in experiment 1 with treatment A and group 3B would be data from experiment replication number 3 with treatment B.  You could combine all the listed data from group A for all the replication numbers into a single list and combine all the data from group B in experiment replication numbers in a single list.  By collecting replications of the same experimental treatment in multiple scientific journal articles but this can only be done if raw data is published and can not be properly and correctly done if raw data is removed and only means, standard deviations and statistical significance based on certain alpha values are used.

The pressure to fabricate data.  If students are assigned a homework assignment sometimes they are told to do a statistical significance test.  They might know the prediction the teacher expects and change the data on their homework so that it gets the same outcome in the statistical significance test that they think the teacher wants if they erroneously believe the teacher will give them better grades if the results match what the teacher predicted better, at  least I hope such a belief would be erroneous on the teacher would not subtract points if the results did not match their prediction.

The problem of the ability to fabricate data.  Statistical significance tests are often used where a degree of randomness is assumed.  If randomness is assumed then the results would be assumed to often not replicate the same.  If results are often expected not to replicate the same, then someone could simply not even do an experiment at all and make up data and since replication is not expected because the data is random then no one would be able to argue that the person really did not just make up the data based on this theory of randomness.  I would suggest that we seriously contemplate the possibility that when we read scientific journal articles that people have simply fabricated data without running an experiment at all and that might explain part of the reason why people who try to run the experiment can not get data that is similar enough to match the data in the journal article to consider the results to be replicate-able.

So you can use data and do a lot of stuff to figure out the frequency that results are in certain value ranges that meet or fail to meet your goals based on raw data and in my opinion that is a better application to help the public than doing statistical significance tests with research time and "money" or material resources and although I think that is an improvement I still do not think that is the best use of resources in science.

I would suggest that this statistical way of looking at frequencies that achieve goals based on lists would be better replaced with using putting more emphasis on using equations involving, algebra, trigonometry and calculus that predict a output for a given input.  These equations could assign a margin of error for each input and a range of potential output with a certain margin of error for the input values.  But do we need statistic probability models for margin of error?  No!  If we have a ruler and we have to round to the centimeter then we could assign the maximum and minimum value that the actual distance could be within the range that is reasonable after rounding based on the location of the physical markings on the rulers and no statistics probability distribution models are needed for that.  We would plug in an equation that makes predictions based on the inputs and set the input values that could potential be there considering the margin of error and get predicted output values.  If it is inside the range for the predicted output values then the equation was considered to be a correct prediction and if it is outside the range then it is considered a wrong prediction.  If we find out the equation predicts things incorrectly then we make a new equation that would have predicted the results correctly then rerun the experiment and see if that equation now predicts correct results.

Statistical significance testing often although not always does not have the ability to predict an output for a given input.  I say often but not always because there is an exception called linear regression which allows to predict an output for a given input for example.  Statistics often is only used to predict if two outputs will be different than each other or the same or one will be greater than the other but it is not usually used to predict by how much to outputs will be different from each other by.  You might get a mean and a standard deviation but you usually can not get an equation to guess what the mean and standard deviation will be the next time you run as a function of the input the next time you run the same experiment.

I would suggest the world would be a better place if people focused there research on finding algebra, trigonometry and calculus based equations that work to accurately predict outputs based on inputs within the expected margin of error of the inputs than focusing on experiments that are so poorly designed that you claim the reason you can not make accurate predictions of output values is because of some random variability that limits you into only guessing which group is greater or less than the other group but not by how much.  Although I really do not like statistical significance testing and would consider the so called necessity of statistical significance testing to be a sign that your experiment was poorly designed, I would suggest that there is a place for statistics in science.  Statistics can be a starting place where you have to admit that you really do not have a clue what you are doing and your body of knowledge has not yet achieved a level of a competent science model, one might call this pre science or proto science or primitive science.  Maybe you can use it for a little while if you admit that you do not yet know what you are doing.  But eventually you should move on and make progress with your models to the point where statistrical significance testing is no longer needed and you have a grown up level of competence in that scientific area of study where you can make predictions using algebra, trigonometry and calculus.  This means that as science advances in a field of study more and more algebra, trigonometry and calculus should be used in peer reviewed journal articles and less and less statistical significance testing.  Unfortunately if the trend seems to me seems to be the opposite direction in biological and social sciences that would suggest to me that we are not making progress forward but going backward and the fact that the statistical significance tests were not immediately mocked and abandoned by the community of people who call themselves scientists but instead embraced and pushed on graduate students in most fields of biological and social sciences suggests that in many ways biological and social sciences are in many ways going backwards and not forwards in progress in spite of increased material resources as more electronic tools to store and measure data points are manufactured which would have given further resources to move these so called sciences forward if another type of methodology was used.

Lastly, I would suggest that the so called social sciences might be better if people went back to roots that were less quantitative and so called social sciences were not called sciences at all but were thought of as more like philosophies and religious worldviews about human behavior that might or might not be true.  A person could present a idea about human behavior and the mind and then the audience could simply think and contemplate about whether or not that might be true instead of trying to prove that what they claim is true by presenting the illusion of scientific objectively with the so called scientific process of statistical significance testing.  The types of claims in the so sciences were grand claims that can not be supported by science but are necessary to think about before conducting science in the first place much like different religious or philosophical worldviews about morality, free will, the nature of the human mind and so on.  Before I think about whether or not choosing to do A or B results in whatever output I must presuppose my ability to choose how I run my experiment this is a philosophical pre-requisite for science not science itself.  Social "sciences" have cut themselves short by pretending to be science through the false objectivity of statistical significance testing instead of embracing their grand place as part of philosophy and religion.

Saturday, September 7, 2024

Vacuous Truth or Vacuous Falsehood

 Copyright Carl Janssen 2024 September 7

Vacuous Truth or Vacuous Falsehood


Let's say two days ago, or in other words the day before yesterday, Rob said, "if I do P tomorrow then Q will happen tomorrow"

Now today Samantha says, "Q never occurred yesterday so Rob was lying"

In reply Alexander says, "A statement can only exclusively either be one of two options of true or false but not both and there is no third possibility.  P did not happen yesterday so Rob's statement is not false so that only leaves one other option and this makes Rob's statement vacuously true.  All statements with counterfactual antecedents also called true protasis have true consequents also called true apodosis"

Samantha replies that if Rob said, "if I do P tomorrow then Q will not happen tomorrow would that also have been vacuously true"

Alexander replied, "yes"

Then Samantha said, "then since we know that Rob said if I do P tomorrow then Q will happen tomorrow but we know that it is vacuously true that Q would not have happened tomorrow, this proves that Rob's statement that Q would have happen tomorrow vacuously false because if it is true that Q did not happen that day then it would be false that Q did happen that day" 

Alexander replied, "P did not happen yesterday so there is no evidence that Rob's statement is false and if there is no evidence that it is false then it must be true"

Samantha replied, "The only way for there to be evidence that Rob's statement was true would be if P did happen yesterday and Q also additionally happened yesterday and since there is no evidence that Rob's statement is true then it must be false."

Alexander replied, "Now you got me all confused Samantha.  The statement can only be true or false and only one of those two options and those are the two options.  But a compelling case can be made that it is true and also a compelling case can be made that it is false."

Samantha replied, "Although you are correct that a statement can not be both true and false simultaneously when measured in the same way, the reason for your confusion is because there are more than two options.  You can not know if Rob's statement is true or false because since the claim that P happened yesterday never occurred Robs statement is untested we could try to guess at whether or not Q would have happened if P happened and guess whether the statement would be true or false on that basis but since P never happened his statement would be better described as untested then confirmed to be true or false.  Always using the type of logic that limits things to two options of only true or false really does not line up well with the scientific method because some things although they might in reality be only true or false are untested so we should just label them as untested and make the claim that we do not know they are true or false instead of insisting on assigning them a value of being true or false when we do not know which of those two values is correct.  Also depending on how you look at things maybe there could be other options then true or false.  It is important to keep in mind that being proven true is not the same as being true and being proven false is not the same as being false.  Something can not be both true and correctly proven false if you are measuring with a single consistent standard but something can be both true and not proven true at the same time.  Likewise something can not be both false and correctly proven true if you are measuring with a single consistent standard but something can be both false and not proven false at the same time.  One might argue that under a certain standard there could be two potential values of one kind for a certain statement of either true or false but simultaneously three potential values of another kind of either proven true, untested or proven false.  If something is not proven true then it could be true, untested, proven false or false but it could not be proven true.  If something is not proven false then it could be false, untested, proven true, or true but it could not be proven false.  If something is correctly proven true then it could only be true.  If something is correctly proven false then it could only be false.  If something is true then it could only be true, proven true, untested but it could not be false nor correctly proven false.  If something is false then it could only be false, proven false or untested but it could not be true nor correctly proven true.  If something is untested then it could be true or false but could not be proven true nor proven false.  Now in this standard if something is proven true then it is both true and proven true.  Also in this standard if something is proven false then it is both false and proven false.  This is only one standard of looking at things under another standard there could be more nuanced options then true and false, under such a standard something could not be both simultaneously true and not true but being not true would not always mean it is false because there could be a third option.  Likewise something could not be both simultaneously false and not false but being not false would not always mean it is true because there could be a third option.  It is also important to keep in mind that depending on the view point saying that some object S is not not K is not necessarily the same as saying the object S is object K although it could be the same depending on another view point but maybe one of those two view points might be wrong, but maybe both viewpoints or even additional viewpoints could potentially be right depending on the circumstances.  One of these viewpoints is that if an object S is not not the object K then the object S is the same as the object K, I will not go into detail further on this viewpoint because it is a standard viewpoint.  Now for an unorthodox viewpoint.  Let's say there is a computer programming function where you select a input from a list of three words, spoon, fork or knife and it gives you an output that is one of those three words but is not the word you select.  So if you select spoon you will get an object that is not a spoon such as a fork or a knife.  But if you run the function a second time with the output that you got from the first time you run it you could end up with a spoon again but you could also end up with a fork or a knife as long as it is not the same object as the output that you got the first time you ran the function.  You could say that the function negates your choice, so running it twice is negating your choice then negating it again but you would not necessarily end up with the same choice that you started with even though it is claimed that double negating something ends up where you started.  Perhaps double negating only guarantees consistently ending up where you started if you negate something by selecting a list of every object that is not on your list and not having a single object that is on your list each time when you negate the list.  To be more technical we could talk about every item in a sample space vs every item even items we are not working with and still have it work so long as we never list items outside the sample space when negating and stick with the same sample space to select from neither adding from it or subtracting from it in all operations although I am not sure this is worded correctly because the language is a bit too technical in definitions for me at this point.  I also want to point out that double negating only consistently works this way where you end up with what you originally had if you negate it twice in English when each of the two words that say 'not' are right next to each other with no words in between in a sentence.  For example to say let's assume if something is a dog it is an animal is a statement that is always true but to say if something is not a dog it is not an animal is a statement that is not always true even though two negations were added two the sentence, however to say that if something is not not a dog then it is an animal would be a statement that would be always true based on a certain viewpoint because the two times the word "not" is used each 'not' is next to the other 'not' with no words in between"

Alexander replied, "But how do you apply this"

Samantha replied, "If P happens then Q will happen on the same day has been claimed.  Since P did not happen it has not been confirmed or proven false that Q will happen if P happens but just because it was not proven false does not mean that it is proven true.  Since P did not happen it also has not been confirmed or proven true that Q will happen if P happens.  The statement could be true and the statement could be false but the statement could not be both true and false, the statement is untested and neither proven true nor proven false.  It is important not to confuse true with proven true nor to confuse false with proven false.  It is important to remember that not false does not necessarily mean the same thing as true nor does not true necessarily mean the same thing as false depending on what logic system you are using.  And finally depending on how you try to negate something twice it does not necessarily result in ending up where you would have started with zero negations if that also somehow is related to this confusion although I am not sure if it is.  If there is a list of three or more options when someone thinks there is a list of only two or more options they might choose something that is not object 1 and assume it is object 2 then choose something that is not object 2 as a second negation and assume they are going back to object 1 and undoing the second negation when they could actually lead to object 3 by choosing something that is not that object a second time.  For example if someone says choose something that is not a spoon and they select a knife and then they say now choose something that is not the thing you just selected and they think the only option is a spoon when it could actually mean a fork." 

https://en.wikipedia.org/wiki/Consequent

https://en.wikipedia.org/wiki/Antecedent_(logic)

https://en.wikipedia.org/wiki/Paradoxes_of_material_implication

https://en.wikipedia.org/wiki/Principle_of_explosion

https://en.wikipedia.org/wiki/Double_negation

https://en.wikipedia.org/wiki/Law_of_excluded_middle

https://en.wikipedia.org/wiki/Counterfactual_conditional

https://en.wikipedia.org/wiki/Vacuous_truth

https://en.wikipedia.org/wiki/Three-valued_logic

Saturday, August 24, 2024

Calculating sine of average of angles and cosine of average of angles from tangent of average of angles and other proofs for other trigonometric identities

Copyright Carl Janssen 2024 August 24

Calculating sine of average of angles and cosine of average of angles from tangent of average of angles and other proofs for other trigonometric identities

Explaining with a lot of words

It might be technically better to use the term point instead of vertex

This proof is intended to work for real number values and all values plugged in are intended to be real numbers for the way the proof is written but that does not mean whether or not the end result will or will not work if either complex or pure imaginary numbers are used

All angles mentioned in this proof before the second use of the word "solved" refer to angles measured at the origin relative to a horizontal line y = 0 and a second line segment in quadrant 1 of the unit circle

For positive angles between greater than 0 degrees and less than 90 degrees when the angle at the origin is measured.  Although this proof will in the end work for any real number angles it is easier to visualize it within quadrant 1

If you take two triangles each on the unit circle with a unit-less radius of 1 and a shared Cartesian origin of (0,0) for one of their vertexes and a second vertex of (1,0) and the third or interesting vertex of (cos(angle), sin(angle)) 

And convert the interesting vertex of each triangle into Cartesian coordinates

Then if you take the average of the Cartesian coordinates of each interesting vertex with the other interesting vertex for the two angles you will get a new vertex which is not on the unit circle and has a distance other than 1 from the origin

If you draw a line segment from the origin to the new vertex created by the average already mentioned then that line will have a slope that is the same as a triangle with an angle equal to the average of the two original angles and the length of the line segment will not be 1

Thus you can use the coordinates of that new point to calculate the tangent of the average of the two angles for the original triangles on the unit circle even thought that point is not on the unit circle so it's coordinates can not directly be used to calculate the sine or cosine

If you multiply both horizontal or X and vertical or Y Caartesian coordinates of the new vertex already mentioned by the same constant and select the correct constant you can get coordinates which form a line segment with a length of 1 from the origin with the same slope as the tangent of the average of the two original triangles angles that can be used to form a third triangle on the unit circle.  With this third triangle the sine and cosine can be calculated.  The cosine of the average of the two angles will be the X coordinates of this newest line segment with a length of 1 and the sine will be the Y coordinates of this newest line segment with a length of 1 that has the same slope as that for the average of the two angles in the original triangles.

Explaining using more algebra and less words if you can not understand why what is being done go to the wordy section above or to the diagram if I add it later

Given

tangent(0.5A+0.5B) = ( N * [ 0.5sin(A) + 0.5sin(B) ] ) / ( N * [ 0.5*cos(A) + 0.5*cos(B) ] )

sin(0.5A+0.5B) = N * [ 0.5sin(A) + 0.5sin(B) ]

cos(0.5A+0.5B) = N * [ 0.5cos(A) + 0.5cos(B) ]

Solve for both

sin(0.5A+0.5B) 

cos(0.5A+0.5B)

as functions of these four functions

cos(A), cos(B), sin(A), sin(B)

Solution

This value of N when multiplied by the coordinates of ( 0.5cos(A) + 0.5cos(B),  0.5sin(A) + 0.5sin(B) ) creates coordinates of a vertex which can form a line segment with a length of 1 from the origin (0, 0)

This line segment has a slope that is equivalent to the angle of ( 0.5A + 0.5B ).  N is calculated using the Pythagorean theorem to calculate the distance from the origin (0,0) to ( 0.5cos(A) + 0.5cos(B),  0.5sin(A) + 0.5sin(B) ) and then taking the reciprocal of that distance

N= 1 / ( [ 0.5sin(A) + 0.5sin(B) ] ^ 2 + [ 0.5cos(A) + 0.5cos(B) ] ^ 2 ) ^ 0.5

sin(0.5A+0.5B) = [ 0.5sin(A) + 0.5sin(B) ] * N

cos(0.5A+0.5B) = [ 0.5sin(A) + 0.5sin(B) ] * N

Solved but needs to be simplified to get other trigonometric identities in simplified form

Solving for the square of the sine of a half angle by letting B = 0

sin(0)=0

cos(0)=1

sin(0.5A+ 0 ) ^ 2 = [ 0.5sin(A) + 0.5sin(0) ] ^ 2 * N ^2

[sin(0.5A) ]^2= [0.5sin(A)]^2 / ( [ 0.5sin(A) + 0.5sin(0) ] ^ 2 + [ 0.5cos(A) + 0.5cos(0) ] ^ 2 )

[sin(0.5A) ]^2= [0.5sin(A)]^2 / ( [ 0.5sin(A) ] ^ 2 + [ 0.5cos(A) + 0.5 ] ^ 2 )

[sin(0.5A) ]^2= [0.5sin(A)]^2 / ( [ 0.5sin(A) ] ^ 2 + [ 0.5cos(A) + 0.5 ] ^ 2 )

[ 1 + cos(A) ] ^ 2 = 1 + 2cos(A) + cos(A)^2 = 2 + 2cos(A) - [sin(A)]^2

 [ 0.5cos(A) + 0.5 ] ^ 2 = 0.25 * [ 1 + cos(A) ] ^ 2 = 0.5 + 0.5cos(A) - 0.25[sin(A)]^2

[ 0.5sin(A) ] ^ 2 = 0.25[sin(A)]^2

[sin(0.5A) ]^2= [0.5sin(A)]^2 / ( [ 0.5sin(A) ] ^ 2 + 0.5 + 0.5cos(A) - 0.25[sin(A)]^2 )

(0.5)^2/ 0.5=0.5 = 1 / 2

[sin(0.5A) ]^2= [0.5sin(A)]^2 / [ 0.5 + 0.5cos(A) ] = [sin(A)]^2 /  [ 2 + 2cos(A) ]


https://www.geogebra.org/graphing


Monday, July 29, 2024

Getting something for nothing of value through money

Copyright Carl Janssen 2024

I am publishing this in 2024.  This was in my drafts from an unknown date

Below is all that was in this draft other than the title, "Getting something for nothing of value through money"

Ponzi scheme

money made out of dung

I will add the following comments

Imagine if someone gave you a piece of dung that was even less valuable than normal dung and you did not want to eat or touch or use for tools and you did not even want to burn it or use it for fertilizer.  But they said this piece of dung had value because you could trade it for other things that had value.  They would get something that is valuable  to you through you in exchange for something that has no value in terms of use to you.  You could only trade this worthless dung if you convinced other people that this worthless dung has value and by recruiting new members to believe this worthless dung has value or maintaining the belief of the current members in the value of this worthless dung.  This dung would be a ponzi scheme just like money is a ponzi scheme if you can not use it for anything other than trading it.  The first person to give away this dung got something of value from someone else in exchange for  losing nothing of value to them and every person after them lost the value of whatever they traded to get the dung in exchange for a piece of dung that is of no value to them.  Each person could recover their loss by giving the piece of dung to trade with the next person but they gained nothing of value by giving away their prized possession for getting the piece of dung in return in the first place.  Money is like a game of hot potato where people throw a potato from one person to another and the last person to hold the hot potato loses when the time runs out loses instead of throwing a potato you are throwing around dung that infects the minds of people who  have not developed the mental immune system to see through the lie.


Involuntary taxation forbidden in Catholic Catechism

Copyright Carl Janssen - This will be published right now in 2024 - This was written at an unknown date - It was in my drafts

The Catechism of the Ultramontane Roman Catholic Church approved by "pope" John Paul 2 clearly forbids involuntary taxation which shall be referred to as taxation for short.

Most people say taxation is ok because it is for a greater good but to paraphrase the catechism it is a sin to do an evil action even if it is done to achieve a good result specifically listing murder, theft and lying.  Taxation is lying to claim someone owes a debt for something they did not agree to and did not owe on account of causing harm to a individual or their possession of property.  Taxation is also theft since it is taken something that rightfully belongs to someone else and not you without their properly informed consent as they were lied to and coerced with the threat of violence being done to you possibly resulting in murder if you do not comply.  Taxation involved three cardinal or mortal sins that are still sins even if a good is promised in return such as creating roads, feeding the poor, providing infrastructure, paying for research education or healthcare or hiring people to defend people from violence, etc.

Furthermore taxation involves an absence of the four cardinal virtues of prudence, fortitude, temperance and justice.


Taxation is an absence of prudence because it is a choice not to exercise your conscience to realize Taxation is a violation of the commands not to murdrr, lie and steal.

Taxation is an absence of fortitude because it involves quitting before using the prudence to figure out how to get the good you want without resorting to sin or a lack of perseverance in exercising  temperance if you must deny gratification of achieving a good you want in order to avoid evil.

Taxation is an absence of justice because it involves hearing false witness, stealing from the rightful owners to give to those who do not rightfully own something and  murdering or threatening to murder people who should not be executed.

Taxation is a failure to implement the grace to live a  more virtuous life.

Taxation is also an absence of the theological virtues of faith, hope and charity

It is an absence of faith as it is a failure to profess and witness to God's moral teachings and involves a lifestyle of dead faith.

Taxation is an absence of the virtue of hope because it is a failure to imagine the possibility that things could work for good if one is obedient to God's moral teachings.

Taxation is an absence of charity because murder, lying and theft are opposed to love.

 Taxation involves a boastful arrogance and lack of generosity because it is worse than giving all you have to the poor that you may boast but lacking love.  Taxation is giving what belongs to someone else that you may boast of feign generosity instead of giving what belongs to you.

Taxation is a violation of the fruits of charity of joy, peace and mercy.

Taxation is a violation of joy because coveting is a killjoy

Taxation is a violation of mercy because how much less mercy is there not to harm someone who has done you no wrong then to harm someone who has done you wrong.  Taxation  of all white people or all males, or the Germans, for alleged sins of their ancestors as some people have suggested is a unjust vengeful attitude that lacks mercy.

Taxation is an absence of peace as it involves the threat of violence.  Taxation also stems from anxiety of what will happen if one can not use taxation to get the goods desired.

Taxation is oppositional to beneficence, friendship and communion as it involves reaching out to a third party bureaucratic system for receiving  help instead of developing friendships and using a third party bureaucracy as an excuse not to exercise beneficence towards your neighbor.

Taxation is an absence of benevolence as there is nothing kind about it

Taxation is a violation of the seven gifts of the spirit





Does Ayn rand endorse a lack of empathy?

Copyright Carl Janssen 2024

This was an old draft, all that was in it was the title, "Does Ayn rand endorse a lack of empathy?"

It was written at a unknown date at or prior to 2024.  I do not remember what I was going to say about that but it is probably a question a lot of people have.  For me right now the question of if Ayn Rand endorsed a lack of empathy in the past, I do not really currently feel I have an answer to, I do not know if I thought I had a answer to that question when I wrote the title.

Optimizing violence according to political maps of reality.

Copyright Carl Janssen 2024

The following was in drafts with the exact wording as follows.

"Graph as ordinal data violence imposed by the state vs violence imposed by other sources and danger, suffering and reduced pleasure imposed by lack of state services and bribes"

The following is new content that I am adding in 2024

I think I was going to do a graph of how much different worldviews believe a level of violence from the State would be versus how much violence from non state sources they believe would happen with that level of State violence

The monarchists might believe that once State violence goes below a certain minimum amount that violence from non state sources would increase thus to optimize the minimum amount of total violence from both state and non state sources a particular non zero level of State violence is needed.  The minarchists might believe state violence must be set just right, setting it above or below a certain number would increase the violence from state sources plus non state sources

Some Anarcho capitalists might believe that decreasing State violence to zero will not increase violence from non state sources compared to a higher level of violence.  They would not believe that lowering state violence below a certain amount would guarentee an increase from non state sources of violence.  Thus some anarcho capitalists might believe that in order to minimize the sources of violence from state sources plus non state sources the total amount of violence from state sources should be set to 0

Other people might not care about the level of violence and simply want to maximize the amount of pleasure , comfort and or happiness they have right now and or minimize the amount of pain, suffering and or inconvenience they have right now.  They might be willing to increase the total amount of state and non state violence they receive in order to get better service from government and non government sources

Money oreates economic calculation problems

Copyright Carl Janssen

The following is in drafts I thought I wrote about this somewhere else in more detail but I am publishing it.    I am publishing this in the year 2024 but I do not know what year it was written.  It has at least one typo.  This is exactly how it was written below.

https://en.m.wikipedia.org/wiki/Economic_calculation_problem


https://en.m.wikipedia.org/wiki/Lange_model
Obsfuscates

 You primarily need to know what items you want and which elements (as in the periodic table) form them and then how much energy the physical or chemical changes require.  Most of the things required for day to day living are water, calories, vitamins, protein and dietary elements as well as simple tools or shelter.  The goal is to use something similar in concept to Maslow's ladder

It would not be exponentially increasing unless you want or need an exponentially increasing number of things.although it would be potentially large if you look at the periodic table but many elements are only needed in trace amounts.

Feminist flat shaming skinny women's bodies

https://www.chess.com/forum/view/off-topic/flat-shaming

https://web.archive.org/web/20240729201730/https://www.chess.com/forum/view/off-topic/flat-shaming

https://web.archive.org/web/20180713104703/http://theweek.com/articles/497091/australias-small-breast-ban

Special Relativity Experiments short

 Copyright Carl Janssen 2024 I do not want to delete this content or edit it to remove things but I am not going to finish it.  I will copy ...