AI Glossary

LLM: Large Language Models

Inteligencia Artificial Generativa

Vector: A series of weights assigned to words, to form a topology (webvectors)

Embeddings: the numbers that compose the vectors. Each model has their own way.

Token: How LLM breaks down sentences, to create vectors. Each model has their own way. Overall 70 words = 100 tokens

Vector Database: A database that facilitates calculation of distance between vectors

Transformer: It transforms a set of word embeddings to another set of embeddings

Embeddings

Corpus: The entire set of language data to be analyzed and weighted

Training phase

Parameter: the number of artificial neurons that run thru corpus on training phase looking for patterns (Bird, 240M. ChatGPT3: 175B, chatGPT4: 1T, non-officially)

RAG or Retrieval-Augmented Generation: A technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources

Models

Probabilistic language models: assigns a probability to every sentence in English in such a way that more likely sentences (in some sense) get higher probability

Fine-tunning: A model with weights adjusted by an annotated dataset so it works for more specific cases

Quantized (or distilled) LLM: A slimmed down LLM model for slower machines

Prompt

LLM Usages: human-to-human or human-to-machine communication; audio-to-text; video-to-text; text-to-text (prose-to-code included)