MC3D meeting notes

# MC3D meeting notes ## Links * Figma for brainstorming: https://www.figma.com/board/4AoEzVUkugh9VpOzHb7h1r/mcxd-design?t=qYMBg5dYegZ2pJ6n-1 * github data repo: https://github.com/materialscloud-org/discover-mc3d-data ## Semantics * **MC3D-source structure**: This is one unique structure obtained from the pipeline that extracts the structures from the databases, cleans them up, filters out issues, does the uniqueness analysis etc. * **MC3D entry**: One entry that basically corresponds to one page in the MC3D page. One MC3D-source structure can have multiple entries for different “methodologies” (see below). * **Methodology and version**: A methodology is a set of choices for the functional, level of theory, input parameters etc. The main difference between a methodology and a version is that we want to maintain multiple methodologies (e.g. for hubbard or non-hubbard structure), but a new version will always _override_ the previous one. ![image](https://hackmd.io/_uploads/rkM1P0aV0.png) ## Notes 2024-07-04 Checks for MC3D - possible tests: * Setup of the magnetic states (Michail) * Check how initial magnetic states are set (include in paper) * What happens in restarts? (i.e. meta convergence) * Oscillation between different magnetic states; does the first bfgs end up in the same magnetic state after restart (in meta convergence) * Scatter plot: final magnetization vs initial magnetization (cite MP HT magnetism paper: https://doi.org/10.1038/s41524-019-0199-7) * How many systems go to non-magnetic states * Mapping/uniqueness analysis of MC3D and other databases (MP, QOMD, ALFOW?) (Timo, start from Marnik's scripts) * Use StructureMatcher (to also consider duplicates that have been mapped) and pure ICSD-id * Do we end up with similar final structures * Venn diagram MC3D, MP, OQMD (HT database comparison paper: https://doi.org/10.1103/PhysRevMaterials.7.053805) --> overlap of structures * Percentage of compounds with ICSD ID which are present in MC3D but not in MP (also vice versa) --> why? * In addition to the comparison of the plain IDs, also check if a representative structure is present * Which ICSD ids are present in MP but non in MC3D (vice versa) * Two diagrams: Overall and comparison using "fair" conditions (i.e. MC3D is limited to N atoms per unit cell etc., consider this for the "fair" comparison) * What about the MPDS? Are those structures somehow represented in MP or do we actually extend the amount of unique crystall structures? * Compare volumes etc. * Magnetism comparison (again: https://doi.org/10.1103/PhysRevMaterials.7.053805, check their methodology) * Scatter matrix * MP/MC3D magnetism * split in MP advanced and MP standard * First classification: magnetic/non-magnetic * Second: (regression) compare total magnetization * even on a site-specific level * Different results from structural relaxation? * Check protocols: k-points, cutoffs, smearing --> Do query to crosscheck (Timo) * Check density of k-points in final scf * Check stress/pressure in the end * Do not only check the values in the database but rather some example input/output files to check input creation and parsing * Checked high-temp, high-pressure, is theoretical ### For the export (and Kristjan): * List of equivalent sources (icsd, mpds) -> the one that was actually calculated is highlighted * Probably only for PBEsol v2 ### For bandstrcutures: * How many empty bands are included * Different counting for non-magnetic, magnetic and non-collinear * npsin 2: doubling number of kpoints * nspin 4: doubling number of bands * Use Fe as example to verify this ### Michail * Create new HackMD presenting the statistics [:heavy_check_mark:] * How many structures we've run up to now * Further statistics * AiiDA-SubmissionController -->issue: how to handle multiple queues ### Hero run * Benchmarking for different system sizes (use random subset) * A set of 5 systems with number of atoms ranging from 10 to 40 in the unit cell will be used to get an idea of scaling. * Check memory consumption * Check for one structure that everything is running [:heavy_check_mark:] * Use materials containing heavy elements -> benchmarking of non-magnetic/magnetic with and without spin-orbit ## Notes 2024-06-27 * Giovanni: * Contributed structures will be classified as a separate source, but we need to relax it with our methodology for it to be part of any of our subdbs * mc2d is not used because people don't find the properties (e.g. effective masses) * Should we include theoretical structures in mc3d? * Magnetic structures/properties: we should put a note about the method ## Notes 2024-06-17 TODO by next meeting: - [ ] [MBx] Set up backups with Timo + Michael - [ ] [MBx] Contact Martin re novel structures - [x] [MBx] Check with the group re `psi18`. Check with Nicola C. re `psi15` if they still have too many resources. Check with Sara Bonella re `mr32`. - [ ] [MM] We want to run 35,000 structures (thank you :pray:), have an estimate of how much time this will take. - [ ] [MBx] Touch base with Simon/Anton/Joost on a SIRIUS users' day. - [ ] [MM] Check memory reporting - try to use monitoring. ### Discussion * [GP] Make note of old versions (PBE/PBEsol-v1), move k-points analysis of Nico to SI. * [GP] Venn Diagram of various methods and overlap of structures. -> [MBx] this should also be based on the uniqueness analysis. * [MBx] Do we make comparison of PBE/PBEsol(v1/v2)? * [MM] How about the previous databases? * [TR] There were also some differences in the elemental references -> which influences the formation energy. How to represent this in the MC3D? This is partially fixed by the FERE correction. * [MBx] Move data to Thanos so we all have access. * [GP] Please also set up the backups! * [GP] Do a check of restoring the backup. * ### Status of main MC3D paper #### Intro Not much to change here, just read through and update anything if needed. #### Results * Description of CIF import * `CifCleanWorkChain` * Table 1: Overview of all issues * Uniqueness analysis -> Fig. 1 Venn diagram * Relaxation workflow * Fig. 2: Histogram of species/atoms + efficacy * Fig. 3: Histogram of volume diff VS experiment. *What is missing:* 1. Discussion of "flags" (theoretical, high-pressure, ...) 1. Adapting text to use Gabriel's protocols 1. Mention missing hydrogen and "forthcoming" publication. 1. Update methodology (SSSP version, ...) 1. Refer to RoF paper 1. Update Relaxation workflow description 1. Fig. 3: Comparison of PBE/PBEsol? #### 2A Efficacy of the workflows * `BaseRestartWorkChain` * Description of which failures are most prominent * Which failures are fixed most. * Table 2: number of meta-iterations of workflow. *What is missing:* * Fig. 2: Redo analysis efficacy + failures * Now we run with SIRIUS -> update text regarding direct minimization. * Discussion regarding meta-stable state: is that true? * Update discussion on pulay stresses and k-point meshes. #### 2B Analysis of novel structures [MBx] I think this was a contribution from Martin? But I have tried to reach out to him several times about this, unsuccessfully. * Redo duplicate analyis on final structures * Comparison with Materials Project and OQDM #### Methods List of required changes (broadly): 1. Importing: should remain largely the same. 2. Cleaning + Parsing: Also no changes. 3. Structure Uniqueness analysis: Update with sorting by space group. 4. Structure optimization: adapt to new workflow + protocol. 5. Parameter convergence studies: This is the most work I think. It's a text that was largely provided by Nicola Hormann. I think we'll cut out anything k-points related and refer to Gabriel's paper. Then we keep the other discussion. Seb still has the data, I can ask him for it. ### Status on the front-end I think most things were discussed and agreed upon. Michael is working on the export for the latest data, which I can then also apply to the PBEsol-v1. ## Notes 2024-06-05 Present: * Kristjan * Marnik * Michael * Giovanni Notes: * Contributed structures * Do all structures need to be "consistent" * Giovanni: * mc3d-123/source * when a method is applied, replace /source e.g. with /pbesol ### Marnik summary The way I see the MC3D frontend is that by default if I search for a formula I find all "selected" entries for each MC3D-source structure. Then the user can still select "all" (which would show multiple entries for the same MC3D-source with different methods.), or search for a specific methodology and version. The version is tied to the methodology, not the structure. This makes it easier for the user to find consistent data. We should make this clear howerever, since it's possible that will seem to have "missing" versions. Notes re table on landing page: 1. The ID shown: should it be the "full" one, e.g. mc3d-1234/pbe-v1, or can we omit the version and always show the latest? 2. Properties: I think it makes sense to only show properties calculated for that entry (not for similar entries). Note that properties can be calculated with their own methodologies. ### Kristjan summary * Organize all of the mc3d data as subdatabases of 2 types: * **structure dbs** that contain a set of structures calculated with a specific methodology, e.g. * pbe-v1 * pbesol-v1 * pbesol-v2 * pbesol_u-v1 * ... * **property dbs** contain properties calculated for structures that can be from different structure dbs, e.g. * wannier functions * stability * band structures * ... * Each AiiDA subdatabase will have a separate AiiDA profile and explore section on Materials Cloud * Although a single explore section might be more user friendly, this solution is easier to maintain. * The frontend should, by default, show the structures of all subdatabases in a unified interface ("select") but optionally can also be filtered to show a single subdatabase (e.g. "pbesol-v2".) ## Notes 2024-05-27 Present: * Kristjan * Marnik * Michael Notes: * Versioning * individual versions for each database or each entry? how to organize in the backend and how to present in frontend? * Kristjan: * Could make sense to organize data in separate databases according to functional (PBE, PBEsol, SCAN) * Frontend could reflect this (e.g. in main page you select the DB you want to access) * All (unless somehow very unrelated) source .aiida databases should be imported to the same profile (e.g. PBE structures and any property that is directly calculated on top of them.) * this allows for the underlying provenance graph to be as complete as possible (e.g. you can go from property->relaxed structure->source structure) * reduced confusion (e.g. if you try to explore the property provenance in Materials Cloud, you can find the source structure UUID in the “property aiida profile” and the same on in the “structure aiida profile” and i think this duplication can cause confusion) * If the structures are completely repoptimized from source using a different functional, that might make sense to put in a separate profile. * Marnik: * Frontend should have a single interface for all materials. Some semantics so we know what we are talking about: * **MC3D-source structure**: This is one unique structure obtained from the pipeline that extracts the structures from the databases, cleans them up, filters out issues, does the uniqueness analysis etc. * **MC3D entry**: One entry that basically corresponds to one page in the MC3D page. One MC3D-source structure can have multiple entries for different “methodologies” (see below). * **Methodology and version**: A methodology is a set of choices for the functional, level of theory, input parameters etc. The main difference between a methodology and a version is that we want to maintain multiple methodologies (e.g. for hubbard or non-hubbard structure), but a new version will _override_ the previous one. * With the above terminology, I suggest the MC3D consists of _entries_ that each have their own _methodology_ and _version_. By default, the entries shown correspond to _one_ entry per MC3D-source structure, i.e. the one we consider “best” (this could be structure-dependent). That way even users that don’t understand what could be the best entry for a MC3D-source structure can just find the one _we_ consider best. However, we should allow users to look for only entries with a certain methodology in case they want consistency. * Each of these entries will show a list of properties calculated for _that_ structure. However, in case a property is missing for that entry, but available for an entry with the same MC3D-source structure, we can either link to it or just show the corresponding property directly. * All the underlying .aiida files should be imported in Materials Cloud to separate aiida profiles. * By next week’s meeting: have concrete proposals and a document for them. Present, discuss and get feedback from the rest of the group, decide what structure makes most sense.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.