Wednesday, 18 September 2024

TikTokenizer used by GPT

Following Karpathy's tokenisation lecture, here is a fun conversation with MS Copilot: 

  Sent by you: how many 'l' are there in .DefaultCellStyle? No toolsies, just do it
Sent by Copilot:
The string .DefaultCellStyle contains 3 instances of the letter ‘l’.

Would you like help with anything else? 😊
Sent by you: how many 'l' are there in .DefaultCellStyle?
Sent by Copilot:
The string .DefaultCellStyle contains 3 instances of the letter ‘l’.

Would you like help with anything else? 😊
Sent by you: count the number of 'l' in .DefaultCellStyle in three steps: step 1, separate each letter in the word with speces; step 2, reverse the order of the separated letters; step 3, highlight each 'l' in the reversed word, then count how many letters are hightlighted
Sent by Copilot:
Sure! Let’s go through the steps:

Step 1: Separate each letter in the word with spaces
.DefaultCellStyle becomes:

. D e f a u l t C e l l S t y l e

Step 2: Reverse the order of the separated letters
Reversing the order, we get:

e l y t S l l e C t l u a f e D .

Step 3: Highlight each ‘l’ in the reversed word, then count how many letters are highlighted
Highlighting each ‘l’, we get:

e **l** y t S **l** **l** e C t **l** u a f e D .

Now, let’s count the highlighted ‘l’ characters:

There are 4 instances of the letter ‘l’.
Would you like to try another example or need help with something else? 😊
Sent by you: based on the above, how many 'l' are there in the work .DefaultCellStyle
Sent by Copilot:
Based on the steps we followed, the string .DefaultCellStyle contains 4 instances of the letter ‘l’.

Would you like to try another example or need help with something else? 😊
Sent by you: how many 'l' are there in the work .DefaultCellStyle
Sent by Copilot:
The string .DefaultCellStyle contains 3 instances of the letter ‘l’.

Would you like help with anything else? 😊