Kishore Kumar
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Doubts + Issues Live Document This document is created to keep track of any doubts students might have regarding the assignment and to track any live issues with the assignment code/judge. Please refrain from pinging TAs on WhatsApp unless the doubt/issue is urgent. We will frequently monitor this document and reply to unanswered questions/address issues immediately. The first doubt / issue has been filled in by a TA. Please follow the same format when appending your queries / issues to this live document. ## Doubts **Q.** When I click "Run Code" on the big board using my starter code, the judge shows me `FAILURE`, why? > **Kishore:** The starter code provided to you is a sample and is too slow to execute within the timeout period on the judge. Students are expected to refer to the class code pushed [here](https://github.com/suresh-purini-iiith-courses/spp-spring24/blob/main/code/simd/convolve.c) and vectorize it correctly to achieve a successful run on the judge. Note that `convolution_avx2` does not handle edge cases. You need to fix it to account for them. Doing this should result in a submission that runs in around 8.5 seconds. > > Also, opening the latest record file associated with your run will give you the exact error message that caused a `FAILURE` run on the server. **Q.** If `/tmp/student_sol.bmp` isn't deleted, then subsequent runs might seg fault. > **Kishore:** Right. This is because of permission issues. If your solution crashed unexpectedly and `tester.cpp` did not get to run the remove from filesystem code, the next run of your code will segfault. The easiest fix to this is to modify the starter code / your `open` call for creating the solution file to: ```cpp fd = open(sol_path.c_str(), O_RDWR | O_CREAT | O_TRUNC, S_IRWXU | S_IRWXG | S_IRWXO); ``` > This will make sure the permissions are set loose enough for future iterations to work without issues. You also need to delete the existing file `/tmp/student_sol.bmp` before executing your new code again. **Q.** What does this line mean: "As long as your solution ensures that the relative error between yours and the naive solution given in the starter code is < 1e-6, it will be accepted"? > **Kishore:** There is always some error associated with floating point computation right? ((a+b)+c) != (a+(b+c)). If / When you or the compiler vectorizes the code you might have to shift around the order of computation which might induce slight error. This error will be tolerated as long as the error between this and the order used by the provided solution is less than 1e-6. **Q.** Can you share the details shared during the 2nd Tutorial here? > **Kishore:** The following is what `-march=native` triggers: ``` -march=cascadelake -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mavx512f -mbmi -mbmi2 -maes -mpclmul -mavx512vl -mavx512bw -mavx512dq -mavx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mavx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mclwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mpku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=19712 -mtune=cascadelake -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection -dumpbase ``` > And this is the output of `numactl --hardware`, although all necessary information regarding this should be found in a more comprehensive form in the gist shared after the first tutorial. One missing datapoint was the memory info which you can find below. ``` available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 node 0 size: 95182 MB node 0 free: 78842 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 node 1 size: 96711 MB node 1 free: 79747 MB node distances: node 0 1 0: 10 21 1: 21 10 ``` > And this is the Roofline for the current fastest execution time on the board. If you beat it, ping me and I'll update this in the future. This will tell you if at a certain execution time if you're completely bottlenecked by bandwidth or if there is still room for improvement. > ![image](https://hackmd.io/_uploads/Bkcqo9MgC.png) ![image](https://hackmd.io/_uploads/SyKAo9zeC.png) **Q.** What exactly is the final time we should be aiming for in this assignment? Is there any need for a report? And how will we be graded in this assignment? > **Kishore:** The assignment is meant to challenge you to get the program to run as fast as you can get it to run. The hardware limitation is the roofline I've attached above. My code is close to roofline, but there is still room for improvement. You should aim to get it to run as close to roofline as possible. You will be graded on how well you've optimized the given code, essentially your best runtime on the server (You do not need to worry about minor variances). No you don't need to submit a report, but we may ask the students with the fastest times to explain their solution in a tutorial / class. **Q.** I am getting "Bus Error" on running locally even if I am deleting the student_sol.bmp everytime. > **Kishore:** The provided starter code does not do this. If it's the exact starter code then you can ping me on teams. If you've made modifications then it's on you to debug your code. I'd assume its either file related permission issue or an alignment issue. **Q.** Just to confirm , we need to just push the code on github and press on run code on server , right? Asking this because I got almost 3x speedup on local , still the FAILURE appears on server. > **Kishore:** Yes. You can clearly see the commit hash associated with each run on the judge. You can also see the error associated with each run simply by clicking on the record file associated with your run. **Q.** What is the policy on using code or snippets found online? Do we cite it's source somewhere in the code? > **Kishore:** Yes. This applies to small snippets, not entire libraries. You can however refer and write it yourself. **Q.** The code in my local gave some seg fault, but it runs successfully on server. So , want to ask if the server also checks for correctness of the code? Shall I consider my code correct? > **Kishore:** ... Yes server does check correctness. It however does not run an address sanitizer on your code. You might have lucked out / maybe you tried to execute instructions belonging to some ISA your local does not support? Regardless, if it is recorded on the judge then it will be considered as correct. **Q.** Can We use AVX512? > **Kishore:** .... I believe that's the whole point of this assignment. **Q.** Can we optimise our code assumint some properties of the kernel? > **Kishore:** Yes of course. It's intended you do that. You can also assume the exact size of the input file you'll be given. All of those parameters are fixed. Do anything and everything you can to reduce the runtime that appears on the board. **Q.** How do we control the thread affinity? > **Kishore:** https://www.openmp.org/spec-html/5.0/openmpsu36.html. And some other OMP stuff probably. Or you can also use `pthreads` and set `pthread_attr_t` with whatever specifics you want. Any threading library you use will have some method, just google it. **Q.** How much time optimising our code will yield full marks? Under what Time Limit basically ? > **Kishore:** Not decided. It will be something reasonable. In the ballpark of 2.5s should be safe. **Q.** How exactly do I profile the code I've written if the instructions I am using (AVX512) are not available on my system? And also how do I profile the multithreaded code, as the results will likely change if I change the size of the input image, and the input image is too large to place in memory on my system? Not really asking about timing (already doing that), more so context switches and page faults. > **Kishore:** You can't use profiling tools on the judge yes. But runs on your system are still fairly indicative. If you really need to time specific stuff on the judge, you can use `chronos` and time sections of your code and flush them to `stdout` or `stderr`. Context switches / pagefaults you can't really do much. I'm assuming you already have a very fast (<2.5-2s time) if you're asking this. If that's true and you really want to tryhard you might as well use azure credits, get a xeon chip and ssh into it. **Q.** My same code runs with deviations of upto 1000ms >**Kishore:** You are likely getting screwed by the scheduler. If you don't pin your threads to cores (affinity / priority), you will be at the mercy of the scheduler. You can either pin them or you can be lazy ~~like me~~ and submit a few times to get a lucky run in. **Q.** Same correct code shows failure once in a rare while and says I timed out. >**Kishore:** This is a bug. I'm not sure why but docker has issues shutting down after finishing a multi-threaded run. Just submit again, should happen relatively rarely. **Q.** I've reordered the operations a bit to optimize speed and now the error threshold is too harsh. > **Kishore:** I've reduced the error threshold to 1e-4. This should be more than sufficient. If you have a really good case for me to lower this to 1e-3 I could consider it. You can update the `error_threshold` variable in `tester.cpp` to reflect the same. ## Issues - [ ] `rec_x` does not exactly match the run id. > **Kishore:** Noted. Given that it's not a major issue this will be fixed at a later point in time. - [ ] How to run the script on non avx systems ? > **Kishore:** You mean no AVX2 either? Please modify `tester.cpp` for this, we will account for systems without avx2 as well from the next assignment. The modification is fairly simple, just replace the AVX2 ifdef'd part with a simple `for` loop iterating over all elements of the file in a scalar manner. ```cpp for(int i=0; i < file_size; i++){ if(std::abs(student_data[i] - sol_data[i]) > error_threshold) __terminate_gracefully("Error"); } ``` >You can also remove the `remaining` loop. > **Anonymous:** Intel provides an [SDE kit](https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html) to test AVX code. You can use this to check if your code is correct or no. I have not used it myself and so cannot comment on the performance slowdown that may come along with this. ## Requests - Will it be possible to run all best submissions after the deadline on a fresh CPU to record final times, since there was variability in submissions based on times of submissions? > **Kishore:** The variability is fairly less and matters most when you're under 2s. And those who are dedicated enough to get it to 1.3-1.5s territory can also afford to submit repeatedly at whatever time they judge best. Running everything again is not sufficient due to the order in which I run programs mattering (hot/cold cache) + the amount of time it would take to run 70 programs is enough for the CPU to toggle frequencies multiple times. Letting you average your best run across days is much better in contrast. - Would it be possible to perhaps choose to get output from say `perf` for a run in the future? Would help in profiling. > **Kishore:** I understand. Depends on what sir is planning for future assignments. In the near future no, I'm too busy to implement it atm. If you do end up having a 3rd assign / a long 2nd assign then yeah can probably do it.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully