by John on (#6VSG4)
I mentioned in the previous post that not every Unicode character corresponds to a token in ChatGPT. Specifically I'm looking at gpt-3.5-turbo in tiktoken. There are 100,256 possible tokens and 155,063 Unicode characters, so the pigeon hole principle says not every character corresponds to a token. I was curious about the relationship between tokens and [...]The post ChatGPT tokens and Unicode first appeared on John D. Cook.