Tokens and Tokenization are an Important for Fundamental LLM Understanding
by Brian Wang from NextBigFuture.com on (#71HVM)
Tokens are the fundamental units that LLMs process. Instead of working with raw text (characters or whole words), LLMs convert input text into a sequence of numeric IDs called tokens using a model-specific tokenizer. A single token typically represents a common word like (hello") a subword (un" + derstanding") a punctuation mark or space and ...