I'm not sure where to post this, but I was thinking about the D'Agapeyeff Cipher in the data acquisition so feel free to move it somewhere more appropriate if you determine it necessary.
I was looking at D'Agapeyeff and had noticed that, aside from some of the curiosities mentioned, that the number of double-digits in the 80s is approximately 40% (and specifically 40.82%), which is a pretty significant portion. Consider that if you take the 5 most common letters in English (ETAOI or ETAON), that they'll account for 44% -- just 4% more than D'Agapeyeff's 80s. 40% is also a good approximation for vowels, but if the 80s are vowels they're not very nicely distributed (the best ratio I could find in this case was using a diagonal encoding). That got me thinking about the solution to portion 3 of Cryptos and the possibility of brute-forcing a double columnar transposition using the frequencies of vowels against consonants. I toyed in Rumkin for a few minutes with the idea of just looking at vowel distribution in a cipher and found it could quite rapidly rule out many transpositions.
So, the data.
For this particular analysis I used Tolstoy's War and Peace from the Gutenberg Project, and deleted the web address (but left the remainder of their disclaimer) and ignored numbers and so forth. Here's how it looks:
VOWELS-952248 FREQUENCY = 37.665013
CONSONANTS-1575955 FREQUENCY = 62.334987
SETS EXAMINED = 1638135
[1 VOWEL] -690502 FREQUENCY = 42.151715
[1 CNSNT] -322724 FREQUENCY = 19.700696
[2 CNSNTS]-299387 FREQUENCY = 18.276088
[3 CNSNTS]-144734 FREQUENCY = 8.835291
[2 VOWELS]-123408 FREQUENCY = 7.533445
[4 CNSNTS]-42574 FREQUENCY = 2.598931
[5 CNSNTS]-7825 FREQUENCY = 0.477677
[3 VOWELS]-4818 FREQUENCY = 0.294115
DIGRAMS
[1 VOWEL] [1 CNSNT] -279661 FREQUENCY = 17.071925
[1 CNSNT] [1 VOWEL] -271252 FREQUENCY = 16.558597
[2 CNSNTS][1 VOWEL] -254091 FREQUENCY = 15.511002
[1 VOWEL] [2 CNSNTS]-251423 FREQUENCY = 15.348134
[3 CNSNTS][1 VOWEL] -121665 FREQUENCY = 7.427048
[1 VOWEL] [3 CNSNTS]-117017 FREQUENCY = 7.143311
[1 CNSNT] [2 VOWELS]-49202 FREQUENCY = 3.003539
[2 VOWELS][2 CNSNTS]-46132 FREQUENCY = 2.816131
[2 CNSNTS][2 VOWELS]-43720 FREQUENCY = 2.66889
[2 VOWELS][1 CNSNT] -41421 FREQUENCY = 2.528548
[4 CNSNTS][1 VOWEL] -35779 FREQUENCY = 2.184131
[1 VOWEL] [4 CNSNTS]-34829 FREQUENCY = 2.126139
[2 VOWELS][3 CNSNTS]-26717 FREQUENCY = 1.630941
[3 CNSNTS][2 VOWELS]-22054 FREQUENCY = 1.346288
[2 VOWELS][4 CNSNTS]-7246 FREQUENCY = 0.442333
[4 CNSNTS][2 VOWELS]-6589 FREQUENCY = 0.402226
[5 CNSNTS][1 VOWEL] -6263 FREQUENCY = 0.382325
[1 VOWEL] [5 CNSNTS]-6151 FREQUENCY = 0.375488
[1 CNSNT] [3 VOWELS]-2000 FREQUENCY = 0.12209
[3 VOWELS][2 CNSNTS]-1749 FREQUENCY = 0.106768
TRIGRAMS
[1 VOWEL] [1 CNSNT] [1 VOWEL] -232652 FREQUENCY = 14.202266
[1 VOWEL] [2 CNSNTS][1 VOWEL] -212995 FREQUENCY = 13.002302
[1 CNSNT] [1 VOWEL] [1 CNSNT] -107219 FREQUENCY = 6.545195
[2 CNSNTS][1 VOWEL] [1 CNSNT] -99707 FREQUENCY = 6.086624
[1 VOWEL] [3 CNSNTS][1 VOWEL] -98014 FREQUENCY = 5.983275
[1 CNSNT] [1 VOWEL] [2 CNSNTS]-97573 FREQUENCY = 5.956354
[2 CNSNTS][1 VOWEL] [2 CNSNTS]-94218 FREQUENCY = 5.751548
[3 CNSNTS][1 VOWEL] [1 CNSNT] -53462 FREQUENCY = 3.263593
[1 CNSNT] [1 VOWEL] [3 CNSNTS]-47568 FREQUENCY = 2.903794
[1 VOWEL] [1 CNSNT] [2 VOWELS]-44949 FREQUENCY = 2.743916
[2 CNSNTS][1 VOWEL] [3 CNSNTS]-44900 FREQUENCY = 2.740925
[3 CNSNTS][1 VOWEL] [2 CNSNTS]-44190 FREQUENCY = 2.697583
[2 VOWELS][2 CNSNTS][1 VOWEL] -39532 FREQUENCY = 2.413235
[2 VOWELS][1 CNSNT] [1 VOWEL] -37213 FREQUENCY = 2.271671
[1 VOWEL] [2 CNSNTS][2 VOWELS]-37090 FREQUENCY = 2.264163
[1 VOWEL] [4 CNSNTS][1 VOWEL] -29204 FREQUENCY = 1.782761
[2 VOWELS][3 CNSNTS][1 VOWEL] -22866 FREQUENCY = 1.395857
[1 VOWEL] [3 CNSNTS][2 VOWELS]-18137 FREQUENCY = 1.107175
[1 CNSNT] [2 VOWELS][2 CNSNTS]-18084 FREQUENCY = 1.10394
[3 CNSNTS][1 VOWEL] [3 CNSNTS]-17804 FREQUENCY = 1.086847
[1 CNSNT] [2 VOWELS][1 CNSNT] -16870 FREQUENCY = 1.029831
[2 CNSNTS][2 VOWELS][2 CNSNTS]-16105 FREQUENCY = 0.983131
[4 CNSNTS][1 VOWEL] [1 CNSNT] -15977 FREQUENCY = 0.975318
[1 CNSNT] [1 VOWEL] [4 CNSNTS]-15595 FREQUENCY = 0.951998
[2 CNSNTS][2 VOWELS][1 CNSNT] -14441 FREQUENCY = 0.881552
[4 CNSNTS][1 VOWEL] [2 CNSNTS]-12684 FREQUENCY = 0.774296
[2 CNSNTS][1 VOWEL] [4 CNSNTS]-12536 FREQUENCY = 0.765261
[1 CNSNT] [2 VOWELS][3 CNSNTS]-10774 FREQUENCY = 0.6577
[2 CNSNTS][2 VOWELS][3 CNSNTS]-10192 FREQUENCY = 0.622172
[3 CNSNTS][2 VOWELS][2 CNSNTS]-8744 FREQUENCY = 0.533778
[3 CNSNTS][2 VOWELS][1 CNSNT] -7387 FREQUENCY = 0.45094
[2 VOWELS][2 CNSNTS][2 VOWELS]-6369 FREQUENCY = 0.388796
[2 VOWELS][4 CNSNTS][1 VOWEL] -6153 FREQUENCY = 0.375611
[4 CNSNTS][1 VOWEL] [3 CNSNTS]-5525 FREQUENCY = 0.337274
[1 VOWEL] [4 CNSNTS][2 VOWELS]-5462 FREQUENCY = 0.333428
[3 CNSNTS][1 VOWEL] [4 CNSNTS]-5092 FREQUENCY = 0.310842
[1 VOWEL] [5 CNSNTS][1 VOWEL] -4906 FREQUENCY = 0.299487
[3 CNSNTS][2 VOWELS][3 CNSNTS]-4155 FREQUENCY = 0.253642
[2 VOWELS][1 CNSNT] [2 VOWELS]-4048 FREQUENCY = 0.247111
[2 VOWELS][3 CNSNTS][2 VOWELS]-3707 FREQUENCY = 0.226294
[1 CNSNT] [2 VOWELS][4 CNSNTS]-2723 FREQUENCY = 0.166226
[1 CNSNT] [1 VOWEL] [5 CNSNTS]-2706 FREQUENCY = 0.165188
[5 CNSNTS][1 VOWEL] [1 CNSNT] -2651 FREQUENCY = 0.161831
[4 CNSNTS][2 VOWELS][2 CNSNTS]-2607 FREQUENCY = 0.159145
[2 CNSNTS][2 VOWELS][4 CNSNTS]-2396 FREQUENCY = 0.146264
[5 CNSNTS][1 VOWEL] [2 CNSNTS]-2265 FREQUENCY = 0.138267
[2 CNSNTS][1 VOWEL] [5 CNSNTS]-2256 FREQUENCY = 0.137718
[4 CNSNTS][2 VOWELS][1 CNSNT] -2069 FREQUENCY = 0.126302
[1 VOWEL] [1 CNSNT] [3 VOWELS]-1815 FREQUENCY = 0.110797
For those curious, here are Tolstoy's letter frequencies:
E-314818 FREQUENCY = 12.452244
T-226014 FREQUENCY = 8.939709
A-205430 FREQUENCY = 8.125534
O-192828 FREQUENCY = 7.627077
N-184154 FREQUENCY = 7.283988
I-173748 FREQUENCY = 6.872391
H-167028 FREQUENCY = 6.60659
S-162882 FREQUENCY = 6.4426
R-148052 FREQUENCY = 5.856017
D-118277 FREQUENCY = 4.678303
L-96514 FREQUENCY = 3.817494
U-65424 FREQUENCY = 2.587767
M-61642 FREQUENCY = 2.438174
C-61258 FREQUENCY = 2.422986
W-59198 FREQUENCY = 2.341505
F-54886 FREQUENCY = 2.170949
G-51315 FREQUENCY = 2.029703
Y-46264 FREQUENCY = 1.829916
P-45162 FREQUENCY = 1.786328
B-34641 FREQUENCY = 1.370183
V-26901 FREQUENCY = 1.064036
K-20415 FREQUENCY = 0.807491
X-4060 FREQUENCY = 0.160588
J-2574 FREQUENCY = 0.101811
Z-2388 FREQUENCY = 0.0094454
Q-2330 FREQUENCY = 0.009216
Currently, I'm trying to determine if there is a way to transcribe the 2-digit pairs into a 14x14 grid that evenly distributes the 80s; though I've not ruled out double-transposition (obviously).
Oh, also, based on the letter frequency chart above, here is what we should expect out of 196 characters using Tolstoy's letter frequencies (second column are the actual digit frequencies):
Expected Actual
E 24.4 81-20
T 17.5 62-17
A 15.9 75-17
O 14.9 82-17
N 14.3 85-17
I/J 13.7 64-16
H 12.9 83-15
S 12.6 74-14
R 11.5 63-12
D 9.2 91-12
L 7.5 65-11
U 5.1 84-11
M 4.8 72-9
C 4.8 92-3
W 4.6 93-2
F 4.3 94-1
G 4.0 71-1
Y 3.6 04-1
P 3.5 61-0
B 2.7 73-0
V 2.1 95-0
K 1.6 01-0
X 0.3 02-0
Z 0.02 03-0
Q 0.02 05-0