Collaborators' script files
- file:
/mnt/lustre/home/lunC/scripts/fileChecker.sh

Inner join
FileJoiner A B
FileJoiner A -posfilter=B
or FileJoiner B -posfilter=A
depending on the purpose
Left join
FileJoiner A B
FileJoiner A B,outfields=,outemptyfieldstring=
Left join minus intersection with B
FileJoiner A -negfilter=B
Right join
FileJoiner B A
FileJoiner B A,outfields=,outemptyfieldstring=
Right join minus intersection with A
FileJoiner B -negfilter=A
Hi Chang
Thanks for the feedback.
More accurately described the options are :
Inner join : FileJoiner A B [-or- FileJoiner A -posfilter=B -or- FileJoiner B -posfilter=A depending on the purpose]
Left join : FileJoiner A B,optional [most likely with a ,outfields=…. for file B, and a -outemptyfieldstring= as well, to pad out the missing fields correctly]
Left join (minus intersection with B) : FileJoiner A -negfilter=B
Right join : FileJoiner B A,optional [most likely with a ,outfields=…. for file A, and a -outemptyfieldstring= as well, to pad out the missing fields correctly]
Right join (minus intersection with A) : FileJoiner B -negfilter=A
There isn’t really a ‘Full join’ or a ‘Full join (minus intersection)’ as currently implemented; you’d have to combine two runs of the program – it’s the two ‘-negfilter=’ commands above, concatenated and reformatted to make the fields line up correctly. It’s hard to do this automatically without knowing how the final file is meant look, hence not implemented now. It could probably be implemented. I don’t have time to work on it currently (due to the amount of testing needed).
Cheers
Scott
From: LunHsien Chang [mailto:luenhchang@gmail.com]
Sent: Friday, September 29, 2017 11:40 AM
To: Scott Gordon
Subject: Do SQL joins with FileJoiner
FileJoiner per-file options are separated by comma without white spaces. For example
- -quiet to suppress warning
- type headerline when files contain headers
- sep= defines the field separator, default= whitespace
- specify merging key with key=
- specify output fields with outfields= Allow the use of a range of fields (e.g. 1-5) and omitting the last field (e.g. 7-)
1.1 left-join 2 tables. Copy all columns from matched rows of right table to merged file Testing FileJoiner
- left table: clumped.ADHD2017.1.snps
- right table: ADHD2017.betas.temp
- merged table: clumped.ADHD2017.1.betas.temp
1.2 Inner-join 2 tables. All columns from 2 tables are copied to merged file
- left table: meta.data
- right table: clumped.ADHD2017.1.betas.temp
- merged table: ADHD2017.1.betas.strand
2 Create a here document that creates multiple script files for submitting a series of similar jobs
2.1 qsub_clumping_chr.sh is a script that passes multi-line string to a file in Bash, creating multiple files containing code, to do a bunch of similar jobs
- line 7-11: set directories that is referred to in sub script files as variables
- line 15-24: 2 for loops
- line 27 & 43: create script files that are serially named by the 2 iterators: ${i} and ${j}
- line 28-41: contect of individual subscript files
- line 30-32: define how you will run jobs on HPC using PBS. Don't put other non-PBS commands before this section
- line 34-41: plink commnads
- line 46: submit the subscript files as jobs on HPC
- if [ -f file ] note spaces between [] and its content
GWAS ID remapper
An example of 3 quantitative traits
- script file: GWAS_CCO_P6IRT_S6IRT_SP12IRT.txt
- input data file (phenotype data file): uniqID_1stNonmissDepVar_keep.txt
- output file: ID_remapped_sp12_IRT.txt
- ID: ID
- traits:PSYCH6, SOMA6, SPHERE_sum, PSYCH6_IRT, SOMA6_IRT, SPHERE12_IRT
- covariates: age, ageSq (age2), nSEX (numeric sex), sexAge (sex by age), sexAgeSq (sex by age2)
A script to compile PLINK .profile files run separately for different chromosomes or blocks of markers, into a single file with appropriate summing.
All files must have been run for the same set of individuals (not necessarily for the same genetic dataset although that would be the usual practice).
Files can be specified either as a list on the command line, or as a filename with a preceding '@' symbol in which case the individual files are listed in that file, rather than individually on the command line. The two methods can be mixed.
The .profile files can be gzip'd (must have names ending in .gz in this case).
Redirect output to a new (destination) file.
eg. $0 GRStest1_chr*.profile > GRStest1_compiled.profile
Remap ID using GWAS_ID_remapper
- directory of GWAS_ID_remapper /reference/genepi/GWAS_release/Release8/Scripts/GWAS_ID_remapper
- options
- -intabdelim for tab-separated input file
- -headerline if input file has header
- -outpedigree inserts pedigree info, five columns, into output file:
- FAMID
- ID
- FATHERID
- MOTHERID
- GENDER
- directory of phenotype data file: /working/lab_nickm/lunC/sphere/uniqID_1stNonmissDepVar_keep.txt
- output file name: ID_remapped_sp12_IRT.txt This file contains pedigree information and phenotype data file
fieldnumbers
script content
- Usage: match.pl -f file1 -g filetwo -k 1 -l 2 -v 3 4 7
- -f specifies name of file1 (left table)
- -g specifies name of file2 * -f specifies name of file1 (right table)
- -k specifies position of key variable in file1 (1 here)
- -l specifies position of key variable in file2 (2 here)
- -v specifies position of values in file one (3, 4, 7 here) to be added to file2.
- This option cannot be ignored
- In the merged file, values of the added columns are set to dash for unmatched row (i.e. not found in file1); otherwise carry original values (i.e. rows common to both file1 and file2)
- you should get an overlap > 1 Milions, maybe need to use another SNP ID for the merge (if I remember well, the QIMR data uses mostly the ID type CHR:BP)
5.2 Test match.pl using test data files. Suppose there two files with 2 rows common to both
- temp_meta_10rows (file1 or left table in SQL language)
- temp_ADHD2017_11rows (file2 or right table in SQL language)