contribute by < kdnvt
>
When allocating memory, the kernel uses kmalloc, vmalloc intead of malloc. In order to be familier with these API, I read the Memory Allocation Guide
and virtual memory and linux to realize the difference between kmalloc and vmalloc.
kmalloc returns a memory space which is allocated by slab allocator, and because it is "physical continuous", it is suitable for DMA. vmalloc returns memory allocated by page allocator, which is continuous only in virtual address.
The kmalloc is better used to allocate object less than page size, while vmalloc is suitable for memory which needs multiple pages.
In fibdrv, I think the memory won't be very large, so I use kmalloc to implement big number arithmetic. However, I wrap the memory allocation and release with the function bn_new, bn_znew, bn_free so I can change the implementation at any time.
The fibdrv homework description suggests the use of decimal string to implement the big number arithmetic, but I think it is a waste of space.
Therefore, I reference the method of KYG-yaya573142, using an array of unsigned long long, and bind all elements' bits as a big number binary representation.
For a number \(x\), decimal string needs \(\lfloor log_{10}x \rfloor\) bytes of memory , and my method only needs \(\lceil \lfloor log_2x \rfloor/8 \rceil\) bytes of memory. It's about \(8/log_210 \approx 2.4\) times memory usage.
In addition to memory usage, this implementation also benefits the data transformation between kernel and user space, because the data to be copied is much less.
This function allocates memory only for member num in bn_t. The reason I don't use it to allocate the whole structure is that I can declare variable directly and use bn_new like this:
However, after using this design to implement fibdrv, I think it is somewhat confusing the user. In addition to calling API, the user need to consider the behavior of bn_t, which is bad for a abstract interface.
bn_zrenew and bn_extend both extend the allocated memory of bn_ptr. bn_zrenew extends it and set all of bits to zero while bn_extend reserve the original value.
Notice that when using krealloc, instead of assign the return value to bn_ptr->num directly, I use tmp to keep its value and check it, in case that krealloc fails and the original bn_ptr->num will cause memory leak.
This function release the memory of num.
When other functions need to allocate, resize, or release function, I use these function instead of calling kmalloc, krealloc directly, so I can replace the kmalloc with other functions like kvmalloc, vmalloc if needed without changing other functions.
The bn_add_carry implement the big number addition. bn_sub and bn_add do the subtraction and addition with bn_add_carry. bn_move move value of the first big number to the second, while bn_toggle_move move 1's complement of the the first big number to the second.
Original edition free and allocate memory every time, while improved edition using resize function to implement, which is shown below:
This is my first implementation of multiplication. By traversing all bits of the first big number, if it is set then left shift the second big number with corresponding bits and add it so sum
.
However, the performance of it is terrible, so I change the implementation.
This edition multiply the 32 bits data per times. The reason for 32 bits is to prevent overflow.
This function computes the Fibonacci number by repeatedly adding previous two number.
The bn_to_string
function in client.c
transform big number binary representation to decimal string, the implementation code is as shown below:
After implementing the basic Fibonacci number computing, I decided to measure its performance first, so I can see the difference when improving it later.
I output the four types of execution time:
Subsequently, I use the gnuplot to draw the diagram. The initial diagram is as shown below:
As we can see, some interference existed. I followed the homework instructions and eliminated it by executing the shell file below:
In addition, I isolated the eighth core and disabled many interrupt on it by shell file below:
reference: KYG-yaya573142
Run the client on eighth core,and here is the diagram after these changes:
The measurement is still influenced by something. By using the command:
I can see that some interrupt still happen when executing client
.
Finally, I run the client
multiple times, remove the outlier and take the average.
The implementation in client
output files of data, each line in a file are data with the same Fibonacci number. After client
finishes, I run the data.py
to process the data. The code is as shown below:
referenced 4ce43c4
And here is the diagram:
For Fibonacci to 1000:
The bn_to_string
accounted for a large proportion of execution time, so I decided to improve it first. The original implementation needs to traverse the whole decimal string per bit of big number binary representation, which is a big overhead. I improved the performance by scanning 32 bits of big number at once, so it only needs to traverse the whole string per 32 bits. The reason for 32 bits is that I can use an unsigned long long integer to record the remaining carry number without overflow. The implementation code and execution time diagram is as shown below:
After I replacing the free and allocate function in bn_add
with resize funtion, the performance is improved a little:
This time, I tried to improve the Fibonacci number computing time by using fast doubling technique. The implementation code and diagram is as shown below:
The performance is even worse than the iterative version:
I think the multiplication may be the bottleneck, so I improved it with another implementation as mentioned above( bn_mult ), the diagram is as shown below:
And the iterative one is as shown below:
The improved fast doubling is better than the iterative one this time.