LLM: Large Language Models
Inteligencia Artificial Generativa
Vector: A series of weights assigned to words, to form a topology (webvectors)
Embeddings: the numbers that compose the vectors. Each model has their own way.
Token: How LLM breaks down sentences, to create vectors. Each model has their own way. Overall 70 words = 100 tokens
Vector Database: A database that facilitates calculation of distance between vectors
Transformer: It transforms a set of word embeddings to another set of embeddings
Embeddings
Corpus: The entire set of language data to be analyzed and weighted
Training phase
Parameter: the number of artificial neurons that run thru corpus on training phase looking for patterns (Bird, 240M. ChatGPT3: 175B, chatGPT4: 1T, non-officially)
RAG or Retrieval-Augmented Generation: A technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources
Models
Probabilistic language models: assigns a probability to every sentence in English in such a way that more likely sentences (in some sense) get higher probability
Fine-tunning: A model with weights adjusted by an annotated dataset so it works for more specific cases
Quantized (or distilled) LLM: A slimmed down LLM model for slower machines
Prompt
LLM Usages: human-to-human or human-to-machine communication; audio-to-text; video-to-text; text-to-text (prose-to-code included)