import pathlib
import random
import re
mouse_names = [
"mouse1",
"mouse2",
"mouse3",
]
random.seed(1985) # Seed the random number generator so we get the same "random" numbers each time
mouse_codes = {name: random.randint(100000, 999999) for name in mouse_names}
# mouse_codes = {
# "mouse1": 12345,
# "mouse2": 23456,
# "mouse3": 34567,
# }
cwd = Path(".")
for path in cwd.glob("2022*"):
# Grab contents of XML file
if not path.is_dir():
continue
filepath = path / "Series.vxml"
if not filepath.is_file():
raise FileNotFoundError("whoops")
with open(filepath, "r") as fh:
xml = fh.read()
# Determine name, replace with code
namematch = re.search(f'name="([^"]+)"', xml)
if not namematch:
raise ValueError(f"unable to determine name for {str(filepath)}")
name = namematch.group(1)
if name not in mouse_codes:
raise ValueError(f"no code for mouse {name}")
new_code = mouse_codes[name]
xml = re.sub(r'name="[^"]+"', f'name="{new_code}"', xml)
# Overwrite the XML file
with open(filepath, "w") as fh:
print(xml, end="", file=fh)
mh22NH-27 mh22WL-005.v1 mh22WL-004.v2 mh20WL-026.v2 mh20WL-023.v1 mh20WL-022.v2 mh18WL-018 mh18WL-014 mh18WL-020 mh21USC-21qA mh21KK-324.v2 mh21WL-020.v2 mh19USC-19qB.v3 mh19USC-19qA.v2 mh19SHY-001.v3 mh14WL-028.v2 mh14SCUZJ-0008955 mh14SHY-003.v3 mh17USC-17qB mh17CP-002 mh17WL-011.v1 mh17WL-004.v1 mh17WL-022.v1 mh17WL-032 mh15WL-015.v2 mh15WL-042 mh15WL-034 mh15WL-004.v1 mh15WL-001.v4 mh15WL-031.v2 mh10WL-042 mh10WL-028 mh10WL-049 mh10SCUZJ-0082020 mh10WL-031 mh10WL-022.v2 mh16KK-259.v6 mh16WL-012.v1 mh16WL-030 mh03USC-3qC.v1 mh03LW-11 mh03FHL-003.v1 mh03WL-003 mh03FHL-001.v2 mh03SCUZJ-0388069 mh06WL-041 mh06WL-063.v2 mh06WL-017.v1 mh06WL-008.v1 mh06WL-051 mh06KK-008.v3 mh12USC-12qC mh12USC-12qA mh12KK-201.v2 mh12SHY-001.v2 mh12WL-002.v2 mh12SCUZJ-0392651 mh11PK-63643.v1 mh11WL-026 mh11WL-003.v1 mh11WL-005.v1 mh11KK-180.v2 mh11WL-036 mh08WL-030 mh08SCUZJ-0523897 mh08WL-058.v1 mh08WL-023 mh08WL-037 mh08WL-056.v2 mh07WL-067 mh07USC-7qA mh07WL-004.v1 mh07SCUZJ-0380111.v2 mh07SCUZJ-0502291.v2 mh07WL-022.v2 mh09WL-023.v2 mh09USC-9qC mh09WL-034 mh09WL-020 mh09SHY-001.v3 mh09WL-044.v2 mh13USC-13qB mh13LS-13qD mh13SCUZJ-0323513 mh13KK-217.v1 mh13WL-001.v2 mh13USC-13qA.v3 mh05USC-5qA mh05USC-5qB mh05WL-067.v1 mh05WL-059 mh05WL-040 mh05WL-026.v2 mh01WL-087 mh01WL-033.v1 mh01WL-005.v1 mh01WL-010.v1 mh01WL-006.v3 mh01WL-070 mh02KK-134.v4 mh02USC-2pC mh02WL-028.v1 mh02SHY-001.v1 mh02WL-003.v2 mh02WL-002.v2 mh04FHL-005.v2 mh04WL-012.v1 mh04SHY-001.v2 mh04WL-028.v1 mh04WL-052.v1 mh04WL-031.v2
Oct 5, 2023Prompt I'm writing a Python program. I have data in the following structure. { "A": [ 10, 11 ], "B": [ 20
May 16, 2023Abstract Under construction. Background Microhaplotypes have emerged in recent years as a novel type of genetic marker with promising qualities, and interest in their applications continues to grow within the forensics, anthropology, and population genetics communities. A microhaplotype (microhap or MH) has been defined as a short region of DNA that 1) spans multiple common SNPs 2) exhibiting multiple allelic combinations that 3) can be spanned by a single next generation sequencing (NGS) read (CITATION Kidd 2014). In 2016, Kidd proposed nomenclature guidelines for microhaps (CITATION Kidd 2016). According to the proposed specification, each microhaplotype is assigned an identifier composed of a standard fixed prefix ("mh"), a two-digit chromosome label, a unique symbol representing the laboratory or principal investigator publishing the microhap, a hyphen, and a lab-specific number or designation. For example, mh05KK-170 refers to the Ken Kidd lab's microhap #170 on chromosome 5. This proposal has been adopted widely as a de facto standard in the forensic genetics literature (Table 1) and community resources like the MicroHapDB database (CITATION Standage 2020). Table 1. Identifiers of a representative set of microhaplotypes published by several independent laboratories.
Mar 6, 2023Bioinformatics Twitter is having...a moment. What began as one man's exasperated rant against poorly documented and distributed code has bloomed into a protracted debate about what constitutes "good" bioinformatics software and which kind(s) of interfaces developers should be expected to provide if they "really" want people to use their software. For those that have been around long enough, this dialogue has a familiar tenor. I don't remember ever seeing a big controversy specifically about GUIs versus CLIs, but the dynamics on either side of the debate seem to play out again and again. Barring significant structural changes, we should expect to repeat variations of this argument with each new academic generation. Since before the term "bioinformatics" was in common use, developers of bioinformatics software have become accustomed to having their contributions trivialized ("they're just glorified techs; we at the bench are the real scientists") and their expertise exoticized ("they're geniuses, wizards, masters of the arcane; I could never learn that"). These attitudes are frankly insulting, both to bioinformaticians and to bench scientists. It is true---to an extent, when requirements are well understood and clearly articulated---that implementing bioinformatics software can be primarily a technical exercise (although the tendency to use "technical" as the opposite of "intellectual" is also mildly insulting). But the same can be said of bench work: when protocols are well established and the study system well understood, lab work is "just technical." The important intellectual component of bench work comes in the design and interpretation of experiments. The parallel in bioinformatics is the development of models and the design of software components and notation to guide implementation and build intuition, and the structuring and formatting of outputs to facilitate interpretation. Of course, when studying any sufficiently interesting or novel question, software requirements will be poorly understood at the onset and elastic throughout development. Crafting accurate and stable software under such conditions requires considerable training, experience, agility, and creativity. Insinuating that if bioinformaticians really cared they would provide a "simple" GUI is a fallacy that perpetuates a dismissive attitude toward the work that goes into bioinformatics software engineering. It's almost more uncomfortable when a biologist with a bench background introduces me to their friends as "a bioinformatician" with raised eyebrows and knowing glances, as if I'm an acolyte of some forbidden mystic art. Bench biologists routinely bring extensive domain expertise and technical competence to bear in the laboratory, interfacing with complex instrumentation and equipment to perform tasks of no small complexity. No bench scientist expects that they should be able to proficiently perform a sophisticated multi-stage lab procedure and interpret the outcome without a fair amount of time spent in reading, preparation, and trial & error. So it's a wonder that anyone regards bench scientists as incapable of using bioinformatics software for which menus, tabs, buttons, and drop-downs have not been provided. Nobody should pretend that this isn't condescending. Both sides of the ongoing debate can probably agree on the sad state of bioinformatics software, too much of which is poorly documented, not very portable, difficult to install, and unreliable. The disagreement lies, at least partially, in the role that GUIs can or should play in addressing this issue, and how the attitudes described above are reflected, implicitly and unwittingly, in the debate. With all that as context, I want to respond to a handful of common myths and fallacies about bioinformatics software, GUIs, CLIs, and related topics that have featured in the ongoing debate. (And just in case any reader is unfamiliar, a GUI is a graphical user interface, and a CLI is a purely textual command line interface.)
Sep 9, 2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up