Algorithms
Sort
python
def heapify(input_array, index, bound):
"""將資料結構化為heap
parameters:
input_array: 輸入的未排序陣列
index: 要排序的root
bound: 比較邊界,如果某個root的左、右索引已超過陣列長度就不處理
假設,root的index=n,那其節點索引為
root: n
root_left_node: 2n + 1
root_right_node: 2n + 2
舉例來說,[0, 1, 2, 3, 4, 5, 6]
0
1 2
3 4 5 6
0為root,左邊節點為1,右邊則為2
1為root,左邊節點為3,右邊則為4
heap sort的一半是最後的節點,因此只會比較到陣列一半取上高斯
其餘皆會是最終節點,不需要再比較,所以3456本身是不需要比較的
我們會從2開始比,也就是7/2 = 3 - 1 = 2(因為python的索引初始值為0)
要記得,python的索引是由0開始
root永遠是三個節點中最大的(或最小的),但節點不考慮大小
5 5
1 2 2 1 左邊兩種情況都可以
"""
# 先預設最大值為root index
largest_idx = index
# print(f'init_array: {input_array}')
# print(f'init_larget_idx: {largest_idx}, value: {input_array[largest_idx]}')
left_idx = index * 2 + 1
# print(f'left_idx: {left_idx}')
right_idx = index * 2 + 2
# print(f'right_idx: {right_idx}')
# 下面的比較,因為我們只關心root是最大的,因此不需要再比較左右兩邊
# 注意到與陣列長度的比較條件必需在前面,不然會引發索引異常
if left_idx < bound and input_array[largest_idx] < input_array[left_idx]:
largest_idx = left_idx
# print(f'l_larget_idx: {largest_idx}, value: {input_array[largest_idx]}')
if right_idx < bound and input_array[largest_idx] < input_array[right_idx]:
largest_idx = right_idx
# print(f'r_larget_idx: {largest_idx}, value: {input_array[largest_idx]}')
# 比較之後,如果最大值的異動,那就調整array的順序
if largest_idx != index:
input_array[index], input_array[largest_idx] = input_array[largest_idx], input_array[index]
# print(f'new array: {input_array}')
# 遞迴處理heapify
heapify(input_array, largest_idx, bound)
def heap_sort(input_array):
"""heap sort"""
array_size = len(input_array)
# heap sort會從底層開始排序上來
# 而且並不會所有的元素都排序,因為n//2 + 1之後的節點都是最終節點
# 只有n//2 + 1之前是root,因此只做這邊的迴圈比較
# 這邊只是做heap的結構
for i in range(array_size // 2 - 1, -1, -1):
heapify(input_array, i, array_size)
# 處理排序
# 我們知道,經過heap過的結構,最大的一定是在最上面(假設是max heap)
# 因此,實作上會將root的部份跟最後的結點做交換,再將n-1做一次heap
for i in range(array_size - 1, 0, -1):
# 每次heap之後都做頭尾的交換
# print(f'heap_sort: {input_array}')
input_array[0], input_array[i] = input_array[i], input_array[0]
# 特別注意到,每一次迭代都會減少一個,因此這邊給的是迴圈中的i,而不是array_size
heapify(input_array, 0, i)
return input_array
下面給出五個資料集在不同資料量下的結果,其中
Heap Sort | 20000 | 40000 | 60000 | 80000 | 100000 |
---|---|---|---|---|---|
l1 | 416 ms | 606 ms | 864 ms | 1.23 s | 1.51 s |
l2 | 255 ms | 721 ms | 876 ms | 1.15 s | 1.64 s |
l3 | 265 ms | 603 ms | 923 ms | 1.23 s | 1.8 s |
l4 | 297 ms | 562 ms | 946 m | 1.16 s | 1.59 s |
l5 | 279 ms | 553 ms | 966 ms | 1.22 s | 1.66 s |
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance. As shown in Figure 1, experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning. Meanwhile, NSA achieves substantial speedups over Full Attention on 64k-length sequences across decoding, forward propagation, and backward propagation, validating its efficiency throughout the model lifecycle.
Jun 24, 2025Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pretraining DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.
Apr 7, 2025神經網路相關論文翻譯
Mar 27, 2025W
Mar 4, 2025or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up