Menu
Log in


Probability

<< First  < Prev   1   2   Next >  Last >> 
  • 15 Apr 2024 3:48 PM
    Reply # 13343338 on 13287536

    Guys,

    Just in case you are interested I finally got this result from Python. It confirms the high probability of the name Uergilius being fortuitously present:

    import numpy as np


    # Define probabilities for each letter in a dictionary

    letter_probs = {

    'A': 8.89, 'B': 1.58, 'C': 3.99, 'D': 2.77, 'E': 11.38,

    'F': 0.93, 'G': 1.21, 'H': 0.69, 'I': 11.44, 'K': 0.01,

    'L': 3.15, 'M': 5.38, 'N': 6.28, 'O': 5.40, 'P': 3.03,

    'Q': 1.51, 'R': 6.67, 'S': 7.60, 'T': 8.00, 'U': 8.46,

    'V': 0.96, 'X': 0.60, 'Y': 0.07, 'Z': 0.01

    }


    # Normalize probabilities to sum to 1

    total_prob = sum(letter_probs.values())

    normalized_probs = {letter: prob / total_prob for letter, prob in letter_probs.items()}


    # Define the sequence to look for

    target_sequence = "VERGILIUS"


    def generate_slot_sequence(length=111):

      """Generate a random sequence of letters based on the given probabilities."""

      letters, probs = zip(*normalized_probs.items())

      return ''.join(np.random.choice(letters, size=length, p=probs))


    def check_sequence_in_slots(slots, sequence):

      """Check if the target sequence can be found within the given slots, allowing for interspersed letters."""

      seq_index = 0 # Index of the current letter in the sequence we're looking for

      for slot in slots:

        if slot == sequence[seq_index]:

          seq_index += 1

        if seq_index == len(sequence):

          return True # Found the whole sequence

      return False


    # Simulation parameters

    n_simulations = 1000000 # Number of simulations to run

    matches_found = 0


    for _ in range(n_simulations):

      slots = generate_slot_sequence()

      if check_sequence_in_slots(slots, target_sequence):

        matches_found += 1


    # Estimate the probability

    estimated_probability = matches_found / n_simulations

    print(f"Estimated Probability: {estimated_probability}")


    Estimated probability = .044401

    Last modified: 15 Apr 2024 3:50 PM | Nigel Statham
  • 23 Dec 2023 11:03 PM
    Reply # 13293994 on 13287536

    Thanks again John.  You have helped me a lot to get my thinking and my expression of the problem straight. This is for anyone who might be able to help further.

    I should have made the "scenario" clearer. It is:

     It is hypothesized on entirely independent grounds that the name Vergilius might be encrypted within a particular string of 111 letters that constitute a semantically and syntactically self-contained literary unit in the text of the Eclogues.   Inspection of that string brings to light that:

    1. u is the first letter of the string.

    2. The only r and g in the string follow that u.

    3. The r and the g occur in that order.

    4. An e occurs between the initial u and the r.

    5. 3x i occur after the u e r g and before a following 2 x l.

    6. After the 2 x l only 1x i, 1 x u and 1x s occur.

    7. s is the last letter of the string.

    8. There seems to be a possibility that the name Vergilius may indeed have been encrypted where it was hypothesised it might be as a crude steganography.

    9. What is the probability of this particular sequence of letters occurring only randomly within the string?



  • 23 Dec 2023 7:18 PM
    Reply # 13293977 on 13287536

    Hi Nigel

    Some points, I suspect my last:

    • The calculation you used   0.0096×0.1138×0.0667×0.0121×0.1144×0.0315×0.1144×0.0846×0.076 gives for me the number 0.000000000002337.  If your result came from ChatGPT this simply confirms that a system trained on text is unlikely to be correct for arithmetic. 
    • I agree with Berwin's statements - AI is only to be trusted when its logic or the result can be checked by other means.  Which effectively means it can never be trusted.
    • However this remains the probability that a particular sequence of nine letters gives "vergilius".  Since there are many ways of choosing 9 letters in sequence from 111 (in fact 111!/9!/102! which is about 10^12) we might expect that the probability of finding those letters somewhere in the 111 somewhat greater.  
    • As a statistician I would ask why these particular lines were chosen.  If the question "what is the probability that the text contains vergilius after it has been found, then the probability becomes largely meaningless.
    • If the question was "what is the probability of finding a significant name in the text," then you would have to consider many names.  I can readily see "Marius", not a contemporary poet but a political forerunner of Vergil's patron Augustus, and I am sure there are many more.  I suspect that the probability of finding a name in a text of that length is reasonably high. 
    • This suggests that unless the question is better posed then there is little purpose in doing calculations beyond discovering how poor ChatGPT can be.
    • But I have enjoyed revisiting Latin poetry after 50+ years.

    Regards John

  • 23 Dec 2023 10:59 AM
    Reply # 13293931 on 13287536

    John,

    The suggested encryption is within the semantically and syntactically self-contained literary unit, Eclogue I lines 19-21, where it was hypothesised it might occur (Rome was where Vergil was resident and the name of the city only occurs twice in the Eclogues and in the same context!).

    VrbemquamdicuntRomamMeliboeeputauistultusegohuicnostraesimilemquosaepesolemuspastoresouiumtenerosdepellerefetus

    My starting assumptions are:

    1. A calculation (as you surmised) of the probability that the 9 letters of the name Vergilius would turn up randomly in their order in the name but not necassarily in their actual positions within the string of 111 (not 133, sorry) letters.

    2. The probability of v and the final s occurring randomly is 1 in 24 in both cases. The probability of the sequence e r g i l i u occurring randomly is a function of the number of ways that sequence can occur in between v and final s.  Is the total probability involved what chatgpt has calculated?  

    3. The calculation will only be an approximate one because there is dependence between letters within the morphology of the language (letter  combinations). Would the probabilities would be reduced or increased?

    4. If the name occurs where on treasonable grounds it was hypothesised that it might, the probability that it is purely coincidental is reduced.

    5.  If the name were found to occur elsewhere in the Eclogues within semantically and syntactically self-contained literary units, as it does in Eclogue 1 lines 19-21, the probability that it is purely coincidental would be further reduced.


  • 22 Dec 2023 5:58 PM
    Reply # 13293681 on 13287536

    Thanks Berwin.

    So much for that brand of AI! A couple of times when I have used it to name references it has provided completely false chapter and verse (in Rupert of Deutz and St Augustine), but I put it down to a possible lack in its configuration for that literary field. I thought it would surely be more reliable for statistical calculations.  Are there forms of AI that are more reliable for such?     

  • 22 Dec 2023 2:45 PM
    Reply # 13293666 on 13293306

    Even if the formula is correct (and that is a big if as John points out), the numerical answer is almost surely incorrect.

    AFAIK, chatgpt has no mechanism for plugging numbers into a formula that it gave you and report the result of such an evaluation.  As others have observed, chatgpt is 95% prediction of words and 5% hallucination.  Reported numbers seem to be, conditional on suggestion that they are based on a provided formula, are 100% hallucination.

    My guess that in this case chatgpt has correctly guessed what your desire is and hallucinated an astronomically small number pretending that it was the result of evaluating a certain formula. :-)

    Nigel Statham wrote:

    Duncan,

    I finally worked out how to employ chatgpt to do such calculations. It  gave me the following result (using letter frequencies in Eclogues I-X). Does the formula used seem correct?

    [...]

      


    Last modified: 22 Dec 2023 2:45 PM | Berwin Turlach
  • 22 Dec 2023 2:05 PM
    Reply # 13293663 on 13287536

    I might be missing something here, but this does not appear to be correct in two ways:

    • It is not clear how the possibility of the letters appearing in sequence somewhere in the text has been considered.  In the simplest terms, where does the calculation consider the length of the text?  Rather, it seems to be a calculation of the probability that 9 randomly chosen letters form the name Vergilius.
    • If this was the intent, the letter frequencies are not enough since some letter pairs, triples etc will appear more frequently than their letter frequencies would suggest assuming independence.  For example, many Latin male names end in "ius" (including contemporary poets Ovidius and Horatius).   

    Approaching this from a different angle, it is sensible to ask what other names appear in the text using the same rules.  I suspect their are many.

    It would be useful to have the actual text you are working with and in particular the 133 letters considered relevant.



  • 21 Dec 2023 2:13 PM
    Reply # 13293306 on 13287536

    Duncan,

    I finally worked out how to employ chatgpt to do such calculations. It  gave me the following result (using letter frequencies in Eclogues I-X). Does the formula used seem correct?

    ChatGPT

    To calculate the probability of the sequence "vergilius" occurring but not necessarily contiguously in a string of 111 letter-slots, you need to consider all the possible ways the letters can be arranged within the string. Since the order matters, you'll be calculating the probability of each individual letter in the sequence occurring in the specified order.

    The probability of the entire sequence occurring in order but not necessarily contiguously is the product of the probabilities for each individual letter:

    P("vergilius")=P(vP(eP(rP(gP(iP(lP(iP(uP(s)

    Given the frequencies of the letters:

    A 8.89 % 1/x = 8.89/100, x = 11.24 B 1.58 % C 3.99 % D 2.77 % E 11.38 % F 0.93 % G 1.21 % H 0.69 % I 11.44 % K 0.01 % L 3.15 % M 5.38 % N 6.28 % O 5.40 % P 3.03 % Q 1.51 % R 6.67 % S 7.60 % T 8.00 % U 8.46 % V 0.96 % X 0.60 % Y 0.07 % Z 0.01 % 

    Now, multiply these probabilities:

    ("vergilius")=0.0096×0.1138×0.0667×0.0121×0.1144×0.0315×0.1144×0.0846×0.076P("vergilius")=0.0096×0.1138×0.0667×0.0121×0.1144×0.0315×0.1144×0.0846×0.076

    Calculate the result:

    ("vergilius")≈0.0000000625P("vergilius")≈0.0000000625

    So, the chance of the sequence "vergilius" occurring but not necessarily contiguously in a string of 111 letter-slots with the given frequencies is approximately 6.25×10−86.25×10−8, or very low.


      

  • 12 Dec 2023 7:27 PM
    Reply # 13289967 on 13287536

    Thanks John and Duncan for your replies, which encouraged me to do a  count of Z in an extended body of Latin text of 595,024 letters from the same period --computer assisted! What I found, which I'm sure will reduce the probability greatly, is that out of the 95 instances of Z that turned up in that text only one consists of a syntactically and semantically self-contained unit, and that is the one that occurs in Y as hypothesized that it might.  Would you be so kind as to let me know how to use chatgpt to do the calculation. 

    Nigel.


  • 7 Dec 2023 8:14 AM
    Reply # 13287950 on 13287536

    I don't know why this fascinates me (for a while) but the sequence Vergellius only occurs once in (sorry about fonts, trying to make it more compact)

    M. Tityre, tu patulae recubans sub tegmine fagi 1.1

    siluestrem tenui Musam meditaris auena;

    nos patriae finis et dulcia linquimus arua.

    nos patriam fugimus; tu, Tityre, lentus in umbra

    formosam resonare doces Amaryllida siluas. 5

    T. O Meliboee, deus nobis haec otia fecit.

    namque erit ille mihi semper deus, illius aram

    saepe tener nostris ab ouilibus imbuet agnus.

    ille meas errare boues, ut cernis, et ipsum

    ludere quae uellem calamo permisit agresti. 10

    M. Non equidem inuideo, miror magis: undique totis

    usque adeo turbatur agris. en ipse capellas

    protinus aeger ago; hanc etiam uix, Tityre, duco.

    hic inter densas corylos modo namque gemellos,

    spem gregis, a! silice in nuda conixa reliquit. 15

    saepe malum hoc nobis, si mens non laeua fuisset,

    de caelo tactas memini praedicere quercus.

    sed tamen iste deus qui sit, da, Tityre, nobis.

    T. Vrbem quam dicunt Romam, Meliboee, putaui

    stultus ego huic nostrae similem, quo saepe solemus 20

    pastores ouium teneros depellere fetus.

    sic canibus catulos similis, sic matribus haedos

    noram, sic paruis componere magna solebam.

    uerum haec tantum alias inter caput extulit urbes

    quantum lenta solent inter uiburna cupressi. 25

    M. Et quae tanta fuit Romam tibi causa uidendi?

    T. Libertas, quae sera tamen respexit inertem,

    candidior postquam tondenti barba cadebat,

    respexit tamen et longo post tempore uenit,

    postquam nos Amaryllis habet, Galatea reliquit. 30

    namque (fatebor enim) dum me Galatea tenebat,

    nec spes libertatis erat nec cura peculi.

    quamuis multa meis exiret uictima saeptis,

    pinguis et ingratae premeretur caseus urbi,

    non umquam grauis aere domum mihi dextra redibat. 35

    M. Mirabar quid maesta deos, Amarylli, uocares,

    cui pendere sua patereris in arbore poma;

    Tityrus hinc aberat. ipsae te, Tityre, pinus,

    ipsi te fontes, ipsa haec arbusta uocabant.

    T. Quid facerem? neque seruitio me exire licebat 40

    nec tam praesentis alibi cognoscere diuos.

    hic illum uidi iuuenem, Meliboee, quotannis

    bis senos cui nostra dies altaria fumant.

    hic mihi responsum primus dedit ille petenti:

    'pascite ut ante boues, pueri; summittite tauros.' 45

    M. Fortunate senex, ergo tua rura manebunt

    et tibi magna satis, quamuis lapis omnia nudus

    limosoque palus obducat pascua iunco:

    non insueta grauis temptabunt pabula fetas,

    nec mala uicini pecoris contagia laedent. 50

    fortunate senex, hic inter flumina nota

    et fontis sacros frigus captabis opacum;

    hinc tibi, quae semper, uicino ab limite saepes

    Hyblaeis apibus florem depasta salicti

    saepe leui somnum suadebit inire susurro; 55

    hinc alta sub rupe canet frondator ad auras,

    nec tamen interea raucae, tua cura, palumbes

    nec gemere aëria cessabit turtur ab ulmo.

    T. Ante leues ergo pascentur in aethere cerui

    et freta destituent nudos in litore piscis, 60

    ante pererratis amborum finibus exsul

    aut Ararim Parthus bibet aut Germania Tigrim,

    quam nostro illius labatur pectore uultus.

    M. At nos hinc alii sitientis ibimus Afros,

    pars Scythiam et rapidum cretae ueniemus Oaxen 65

    et penitus toto diuisos orbe Britannos.

    en umquam patrios longo post tempore finis

    pauperis et tuguri congestum caespite culmen,

    post aliquot, mea regna, uidens mirabor aristas?

    impius haec tam culta noualia miles habebit, 70

    barbarus has segetes. en quo discordia ciuis

    produxit miseros: his nos conseuimus agros!

    insere nunc, Meliboee, piros, pone ordine uitis.

    ite meae, felix quondam pecus, ite capellae.

    non ego uos posthac uiridi proiectus in antro 75

    dumosa pendere procul de rupe uidebo;

    carmina nulla canam; non me pascente, capellae,

    florentem cytisum et salices carpetis amaras.

    T. Hic tamen hanc mecum poteras requiescere noctem

    fronde super uiridi: sunt nobis mitia poma, 80

    castaneae molles et pressi copia lactis,

    et iam summa procul uillarum culmina fumant

    maioresque cadunt altis de montibus umbrae.

    https://latin.packhum.org/loc/690/1/0#0

    Ref. 

    My first inclination was that it sounded quite reasonable to happen by chance. Who would know. That poem has around 3000 characters (not including spaces), around 500 words and Vergellius is 10 letters I think. But the one occurrence of it is in a fairly small chunk. The V is very rare. The sequence ergellius is very common. So common I gave up counting but I guess between 12-18 times in 3000 letters. 15 to be precise, reasonably evenly distributed throughout etc. Now to estimate the chance of a chunk landing exactly in another chunk etc. I have been asking ChatGPT to help a bit but given the frequency of the sequence within the text being quite high the probability of it landing within another slightly larger chunk is remarkably high. Not confident

    For the original problem, given that the sequence occurs around 9 times ChatGPT has estimated a probability of around 38% of it landing in any specific 133 letter chunk. It even gave me a chunk calculator :) But I needed to do my little sample and counting exercise to even conceptualise the problem. For a rarer sequence of letters (eg Vergellius) which only occurs once the chance of it landing in the 133 letter chunk is around 5%

    It even suggested similarities with the birthday problem*

    *Lots of simplifying assumptions have been made

    And I am still not confident of earlier anaalysis but my gut feel suggests that it is not a minute chance. And I think the problem seems much more complicated than it is. If you can assume a sequence happens randomly approximately 9 times in chunks of around 130 etc. Of course you would need to know how likely that sequence was in regular Latin poems by chance. Sorry for another edit but my confidence in its lack of rarity is growing by the minute.  But I am pondering old Vergellius only having one V in the poem. But I never studied the classics and only remember one old Latin prayer without even knowing what it meant

    Last modified: 7 Dec 2023 11:50 AM | Duncan Lowes
<< First  < Prev   1   2   Next >  Last >> 
Powered by Wild Apricot Membership Software