Again, for 20 flips, there are 17 sets of 4 consecutive flips (1-4, 2-5,...,17-20), each of which is dependent on the set before it.
That leaves 1 set that is independent(1-4), 1 set that is dependent on one other set(2-5), and 15 sets that are dependent on two other sets.
In order for the first 2 sets to be different, the first 5 flips can have 13 of the 16 possible configurations (same is S, D is different)
SSDS
SSDD
SDSS
SDSD
SDDS
SDDD
DSSD
DSDS
DSDD
DDSS
DDSD
DDDS
DDDD
of these combinations, only the last 2 flips influence the next groups, thus there are four possible endings:
SS = 2/16 with no set the same, 2/16 with at least one set the same
SD = 3/16 with no set the same, 1/16 with at least one set the same
DS = 4/16 with no set the same
DD = 4/16 with no set the same
This tells us there are six unique groups of first 5 flips:
1) first two sets are different, ending with SS = 2/16
2) first two sets are different, ending with SD = 3/16
3) first two sets are different, ending with DS = 4/16
4) first two sets are different, ending with DD = 4/16
5) at least one of first two sets are same, ending with SS = 2/16
6) at least one of first two sets are same, ending with SD = 1/16
by choosing between these 4 endings, we can see what probabilities develop in the remaining 15 flips.
EDIT: continuing the last thought,
in each of the remaining 15 flips, we operate off one of 4 endings:
SS:
A) next flip is the same, resulting in a set that is the same, with the ending SS = 1/2
B) next flip is different, resulting in a set that is different, with the ending SD = 1/2
SD:
A) next flip is the same, resulting in a set that is the different, with the ending DS = 1/2
B) next flip is different, resulting in a set that is different, with the ending DD = 1/2
DS:
A) next flip is the same, resulting in a set that is the different, with the ending SS = 1/2
B) next flip is different, resulting in a set that is different, with the ending SD = 1/2
DD:
A) next flip is the same, resulting in a set that is the different, with the ending DS = 1/2
B) next flip is different, resulting in a set that is different, with the ending DD = 1/2
With a 50% chance of A and B happening for each scenario.
So what we have now is a tree with six top level nodes where every node has exactly two child nodes with each node having a 50% probability (A and B for each ending). There are 15 sets of two child nodes, so for every level down the tree, the probability decreases by a factor of 1/2, until at the bottom of the tree, the total factor is (1/2)^15. This tree is used to solve for the probability for the condition that at least 4 flips are the same (SSS, or an SS node that is a child of another SS node).
Let's first revisit the 7 flips scenario, where the tree has 2 layers below the top level, looking something like this (XX) is a node without the condition, {XX} is a node with the condition:
(SS) (SD) (DS) (DD) {SS} {SD}
{SS} (SD) (DS) (DD) (SS) (SD) (DS) (DD) {SS} {SD} {DS} {DD}
{SS} {SD} (DS) (DD) (SS) (SD) (DS) (DD) {SS} (SD) (DS) (DD) (SS) (SD) (DS) (DD) {SS} {SD} {DS} {DD} {SS} {SD} {DS} {DD}
There are some definitive patterns, but the probabilities depend on how many layers are below the top level. For example, let's follow the (SS) node down 4 layers instead of 2:
(SS)
{SS} (SD)
{SS} {SD} (DS) (DD)
{SS} {SD} {DS} {DD} (SS) (SD) (DS) (DD)
{SS} {SD} {DS} {DD} {SS} {SD} {DS} {DD} {SS} (SD) (DS) (DD) (SS) (SD) (DS) (DD)
at row 1, 1/2 of the nodes have met the condition.
at row 2, 2/4 of the nodes have met the condition.
at row 3, 4/8 of the nodes have met the condition.
at row 4, 9/16 of the nodes have met the condition.
how many nodes will have met the condition at row 5? The answer should be 19/32.
how many nodes will have met the condition at row 6? The answer should be 40/64.
This is looking like a matrix problem, I just have to find a way to structure it correctly. Hopefully I figure it out tomorrow!
To represent this problem, a 5x5 matrix will be constructed for the dependency of each of the endings and the condition, such that multiplying this matrix by a vector containing the current amount of each type of node will give the amount of each type of node at the next level.
Here is the format:
Matrix A Vector x
0 0 1 0 0 ^ n (SS)
1 0 1 0 0 (SD)
0 1 0 1 0 * (DS)
0 1 0 1 0 (DD)
1 0 0 0 2 {XX}
now let's check for the (SS) case worked out above, where x = [1 0 0 0 0]':
row 1 = A*x = [0 1 0 0 1]'
row 2 = A*A*x = A^2*x = [0 0 1 1 2]'
row 15 = A^15*x = [927 1431 1705 1705 27000]'
thus in row 15, 27000/32768, or 84.20% of the nodes meet the condition.
So now this powerful matrix has been discovered, the problem can be approached slightly differently:
After the first two coin flips, we end up with 4 top level nodes of equal probability, one for each of the 4 endings, and the condition hasn't been met at any of them. That leaves 17 more nodes. Using the matrix with n=17 and x = [1 1 1 1 0]':
A^17*x = [19513 30122 35890 35890 402873]
Thus at the last row, 402873/524288, or 76.84% of the nodes meet the condition.
Just for fun, to see what the probability is after 50 flips, we would use n=47 and the same x as before:
A&47*x = 10^14*[0.0170 0.0262 0.0312 0.0312 5.5239]
meaning that the probability of the condition being met after 50 flips is 98.12%