Henrik Håkansson
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ## Relevant concepts - **Stack allocation**: We have learned in Assignment 0 that there are two types of allocated memory: Stack and heap allocated memory. One acquires stack allocated memory by using array declarations inside of functions, as opposed to global declarations of arrays. Memory allocated on the stack is limited in size, but tends to be faster. Moreover, stack allocated memory is thread local and therefore provides an opportunity to untangle the mutual impact of parallel threads on one another. Consequentially, it is an important consideration to employ stack allocated memory in the innermost iteration steps, i.e., the Newton iteration for an individual point on the complex plane. We plan to test this concept by comparing runtimes of variants of our program using stack and heap allocated memory in the innermost iteration steps. - **Memory fragmentation**: Memory fragmentation, meaning that allocated memory is scattered in main memory, can have a detrimental effect on the performance of a program. With the archetype being matrices where allocating it as a single continous memory block can have significant performance improvement opposed to the naive implementation of allocating e.g. each row independently. These performance improvements is typically caused by two effects; first the allocation itself are expensive and thus minimizing the number of allocations is thus preferable and secondly having the memory allocated continously typically improves cache performance. Note that the last effect is not necessarly always true, see for example the section on false sharing below. In C-code this manifest itself as doing a single allocation (e.g. by using the malloc function) opposed to doing several independent allocations. - **Locality**: A topic associated with memory fragmentation is locality. Whereas memory fragmentation is concerned with the allocation of memory, locality considers the access patterns of memory in conjunction with the allocation strategy. Again a matrix serves as a excellent example, where a matrix allocated in row-major order are considerable slower to access in a column-major order, since this will lead to a significant worse cache performance. This performance loss is completly due to bad spatial locality of the accessed memory. Since when traversing the matrix in column major order each memory access will typically lead to a cache miss since for a NxN matrix each column element are separated by N-1 other elements and hence if N is large enough this means that the cache needs to be reloaded for each column element access. This problem can either be solved by changing allocation strategy, which in the previous example would mean to alllocate the matrix in a column-major order, or one can modify the algorithm such that the access patterns align with the allocation strategy. The former approach are typically not preferable as it quickly becomes unmaintainable for larger programs as the changed allocation strategy can effect other parts of the program that access the same memory. The prefered approach is thus to modify the access patterns of the algorithm since such a change typically only affects a local part of the program and is thus a more maintainable solution. While memory fragmentation and locality certaintly have to be considered and should be utilized for this exercise there is however another aspect concerning the memory layout that needs to be taken into account when writing parallel programs, which we will consider next. - **False sharing**: False sharing, meaning that when e.g. two threads reads and at least one writes to memory locations that reside in the same cache line, causes tainted cache lines which in turn invoke the CPU to perform a automatic synchronization step. Note that this opposed to data race, means that the read and writes of the two threads can be accessing two (from the programmers perspective) completly independent variables and hence explaining why this synchronization is performed automatically by the CPU. Hence when planning a parallel program layout one should not blindly optimize for locality since one also has to take false sharing into account and try to find a good compromise between the two. With the last four concepts explained we can now address how these will influence the memory layout of the program. Below are a list of the 4 data structures of the program where the memory layout needs to be carefully considered: 1. Result matrices: two NxN matrices will be needed for saving the results of the newton iterations 2. Synchronization array: A array for communicating between the read and write threads. 3. Arrays of colour strings: To optimize the writing part two arrays of precomputed colour strings will be used. 4. Array of roots: To check convergence of the newton iterations a array containing the roots will be needed. Below a list with the corresponding memory layouts for the above data strutures is shown with a brief explaination for each item. 1. Items in rows allocated continuously, but the rows themself are fragmentated in memory. This allocation strategy where chosen since each row will be accessed by one thread at a time and hence having the elements in the row continuously allocated should improve cache performance without risk of false sharing. On the other hand several rows will be accessed by multiple threads at the same time and thus fragmentating the rows in memory should minimize the risk of false sharing. 2. Array allocated continuosly in memory. This may seem suboptimal considering the effect of false sharing, since clearly a synchronization array will be accessed by several threads. However, with the access pattern of our program this array needs explicit synchronization anyway and thus the effect of false sharing disappears. 3. A single continously allocated block of memory on the stack. Since only one thread is used for writing to file and after creation of the array no writing to the array will be needed false sharing will not be present. Thus it makes sense to minimize the amount of allocations and to optimize for locality, this is obviously done by allocating it as a single block of memory. Further since these will be of limited size they can also utilize the stack. 4. Array continuosly allocated on the stack. These will be of limited size and frequently accessed in the newton iteration. Thus it make sense to give each thread a local stack allocated copy of the array. Note that this have to be carefully done such that the array is only created once for each thread. - **Inlining**: An inlined function is a function where the compiler has substituted the body of the function into the code calling the function. This increases performance for frequently called functions as the function call is omitted. In assignment 1 we saw how this significantly improved the runtime of a function where the body of the function was fast to execute making the function call take a large part of the execution time. In this assignment function inlining is mainly worth taking into account for the computation function since this function is called many times. To generate inlined functions in C-code one can add the static inline keywoard to the function header. Note that this requires the function and function call to be present in the same object file during linking. Meaning that one either has to place the function in the same c-file where it is used or place the implementation of the function in a header file. - **Parsing from command line**: In assignment 0 we learned a simple method of parsing command line arguments by passing them as arguments to main and using the function strtol. The same code was possible to use in this assignment with only minor tweaks. - **Writing to file** There are different functions that can be used in C for writing to file. fwrite() writes binary to a file while fprintf() writes a formatted text to a file. Because it is possible to print a number of precomputed strings, instead of regenerate them all the time, it is possible to use fwrite(). Considering time will favour the use of fwrite() instead of fprintf(). ## Intended program layout Per instructions the program naturally splits into two subtasks: The computation of the Newton iteration and the writing of results to the two output files. Each of them will be implemented in a separate function, that are intended to be run as the main function of corresponding POSIX threads. The computation of the Newton iteration can be further split up into one part that is dependent on the degree $d$ and another part which is not. The $d$ independent task is mainly checking if the algorithm has diverged and this is simply done for each iteration. For the $d$ dependent tasks, checking if the algorithm has converged and performing the actual calculation, an initalization function is used to minimize the number of times the switch case that determines $d$ has to be executed. The initializaiton function sets a function pointer to a hardcoded optimized expression for calculating the next step in the algorithm and a pointer to an array of hardcoded roots to the polynomial. These pointers are then used in the Newton iteration without the need of checking the degree several times. The number of if statements that have to be executed is further reduced by comparing the current value $z$ with the roots only if the absolute value of $z$ is close to 1. This is especially helpful for higher degree polynomials. The writing thread first needs to compute the head of files and what colors that will be written to the file depending on the result of the Newton iteration. The actual writing of rows is waiting until the Newton interations of some row of interest have been computed. When the row is computed, the thread writes the corresponding colors for attractors and convergences. If also the next row is ready from the compute threads, that one will be written to file immediately after completing the previous. Otherwise the thread will again wait till the next row is done.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully