salloc --partition=gpu-l40 --account=stf --mem=10G --gres=gpu:1 --cpus-per-task=1 --time=2:00:00
scontrol show job 24333466 | grep gpu
rm -rf '/gscratch/scrubbed/andysu/miniconda3'
bash Miniconda3-latest-Linux-x86_64.sh -p /gscratch/scrubbed/andysu/miniconda3
python -m pip install --force-reinstall --upgrade setuptools pip
sinfo -t idle
salloc --partition=ckpt-all --gres=gpu:1 --nodelist=g3091 --time=8:00:00
module load cuda/11.8.0
python -c "import torch; print(torch.cuda.is_available())"
import torch
print(torch.__version__)
print(torch.version.cuda) # 確保 PyTorch 版本支援 CUDA
print(torch.backends.cudnn.enabled)
scontrol show job 24202314 | grep TRES
srun --jobid=<jobid> --pty bash
conda create --name my_env python=3.9
conda activate my_env
Introduction This article delves into the architecture and mechanics of decoder-only transformers, which are a crucial component of many large language models (LLMs). It highlights the structure, attention mechanisms, and embedding techniques that make these models effective for various natural language processing (NLP) tasks. Decoder-Only Transformer Architecture Overview Decoder-only transformers, unlike the traditional encoder-decoder structure, use only the decoder component to process and generate text. This architecture is particularly suited for tasks that involve sequential generation, such as text completion and language modeling. Structure The decoder-only transformer consists of multiple layers, each containing self-attention mechanisms and feed-forward neural networks.
Jun 26, 2024Introduction This article provides a visual and intuitive explanation of the transformer architecture, which has revolutionized natural language processing (NLP) by enabling efficient handling of sequential data through self-attention mechanisms. It covers the structure, mechanics, and key components such as the encoder, decoder, attention mechanisms, and embeddings. Transformer Architecture Overview The transformer architecture, introduced by Vaswani et al. in 2017, eliminates the need for recurrent layers by using self-attention mechanisms, allowing for parallel processing and better handling of long-range dependencies. Encoder-Decoder Structure The transformer model consists of an encoder-decoder architecture, each composed of multiple layers.
Jun 26, 2024This article offers a detailed explanation of transformers, a revolutionary architecture in natural language processing (NLP) that has significantly advanced the capabilities of large language models (LLMs). It covers the structure, mechanics, and key components of transformers, including the encoder, decoder, attention mechanisms, and embeddings.
Jun 26, 2024Introduction This article provides a foundational overview of large language models (LLMs) and the transformer architecture, explaining their structure, mechanics, and key components such as the encoder, decoder, attention mechanisms, and embeddings. Large Language Models (LLMs) Definition and Purpose LLMs are advanced neural networks trained on massive datasets to understand and generate human language. They are designed to handle various natural language processing (NLP) tasks such as translation, summarization, and question answering. Evolution LLMs have evolved significantly, with the introduction of models like BERT, GPT, and T5, which leverage transformer architectures to achieve state-of-the-art performance in many NLP benchmarks.
Jun 26, 2024or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up