SAS programming for CDISC SDTM variables

###### tags: `SAS` `CDISC` `SDTM` `PRXCHANGE` `intnx` `compbl` `--TERM` `CMISS` `%sysfunc` `countw` `%scan` `%str` `prxmatch` `length` `--DTC` `datepart` `input` `delete` `libname` `filename` `%substr` `%length` `retain` `PROC SORT sortseq` `merge` `array` `do` `put` `COMPBL` `%do` `%sysfunc` `countw` # SAS programming for CDISC SDTM variables ## Create SAS formats from specfication Excel file :::warning How to use column Codelists[Codelist], Codelists[Code Text] and Codelists[CDISC_Submission_Value] from the spec file [C-200-001_sdtm - Copy.xlsx](https://docs.google.com/spreadsheets/d/1sheRYbj0celxFHllRxlP9xliD2UD24Qg/edit?usp=drive_link&ouid=100470085310127741095&rtpof=true&sd=true) ![](https://hackmd.io/_uploads/rk89s93GT.png) to create a SAS format named `$CL_DEMS0101F` where the label and start variables take values from the Codelists[CDISC_Submission_Value] and Codelists[Code Text] and the format name is created by modifying the Codelists[Codelist]: ![](https://hackmd.io/_uploads/rkVFPhhzT.png) ::: :::success * **Core functions**: `PRXMATCH()` to check the last character of a variable is a number `PROC FORMAT` to export the created formats, informats * **SAS file**: [m_create_formats_from_Codelists.sas](https://drive.google.com/file/d/1xyCy8sUx3cUfpmKHEJ73COiBpo4MGO_x/view?usp=drive_link) ```sas! %macro m_create_format_Codelists(m_Codelist=, m_type=); /*Testing macro. Comment this out when macro is working*/ /*%let m_Codelist=CL.DEMS0101;*/ /*%let m_type=c;*/ /*---------------------------------------------------------------------------- Macro parameters m_Codelist: Copy a value from Codelist column m_type: (1)c if to create a character format (2)i if to create a numeric format Examples of running the macro: %include "\\sas\repos\&studyid.\qc\sdtm\macro\m_create_formats_from_CRFCodeList.sas" ; %m_create_format_Codelists(m_Codelist=CL.RLB.EYEC, m_type=c); ------------------------------------------------------------------------------*/ /*because Excel field names often have spaces*/ options validvarname=any; /*Read in spreadsheets */ libname specs XLSX "\\sas\repos\&studyid.\source\meta\&protnum._sdtm.xlsx" access=readonly; /*Read in the Codelists sheet*/ data work.Codelists; set specs.Codelists; run; /*Create a format name using &m_Codelist. considering if the end is a number*/ data _NULL_; pattern_last_digit_number='/\d{1}$/'; /*Replace dot with underscore in &m_Codelist.*/ _format_name = tranwrd("&m_Codelist",".","_"); Codelist_last_digit_num=prxmatch(pattern_last_digit_number, compress(_format_name)); /*Create format name when the last character is a number*/ if Codelist_last_digit_num > 0 then format_name=compress(_format_name||"F"); /*Create format name when the last character is NOT a number*/ else format_name=_format_name; call symput("format_name",trim(format_name)); run; %put format_name=&format_name.; data work.&format_name.(keep=start label fmtname type); set work.Codelists( keep=Codelist "Code Text"n CDISC_Submission_Value where=(Codelist="&m_Codelist.") rename=("Code Text"n=start CDISC_Submission_Value=label)); /*data types: either c for character, or i for numeric*/ type="&m_type."; /*Create format name*/ if type="c" then do; fmtname=compress("$"||"&format_name."); end; /*Create informat name*/ else if type="i" then do; fmtname="&format_name."; end; run; /*Export format informat*/ proc format library=work cntlin=work.&format_name. ; run; libname specs clear; %mend m_create_format_Codelists; /*Calling the macro*/ %m_create_format_Codelists(m_Codelist=CL.DEMS0101, m_type=c); ``` ::: --- ## Sorting data by character variable with number :::warning **Problem** Sorting a character variable that contains number results in data sorted textually. In the following code, the sortkey IDVARVAL is character in type. Sorted data undesirably place 10, 11, 12,..., 19 between 1 and 2. ```sas! proc sql; create table &supplementalDatasetName._trimmed2 as select * from &supplementalDatasetName._trimmed order by &&&supplementalDomainName._datasetSortKeys; quit; ``` ![](https://i.imgur.com/T5Jp9D6.png) ::: :::success * **Core functions**: Use `PROC SORT sortseq=linguistic(numeric_collation=on);` to sort data numericaly without changing the sort keys * **SAS file**: [qc_pe.sas](https://drive.google.com/file/d/16BM4iV06fapuAbwb6Fke7Pw26JY3H5Uw/view?usp=drive_link) * **Reference**: [Proc sort (ascending) takes 11 12 13, etc. before 2](https://communities.sas.com/t5/SAS-Procedures/Proc-sort-ascending-takes-11-12-13-etc-before-2/td-p/140691) ```sas! proc sort data=&supplementalDatasetName._trimmed2 sortseq=linguistic(numeric_collation=on); by STUDYID RDOMAIN USUBJID IDVAR IDVARVAL QNAM; run; ``` ![](https://i.imgur.com/edOXTPr.png) ::: --- ## Select last non-missing values within a by group :::success * **New variables**: LBBLFL * **Core functions**: * Use `WHERE` to first filter out missing values and then use `IF last.var` statement to locate the last non-missing value of a variable within the `BY` group, creating a temp dataset. Try to specify all the subsetting conditions in the `WHERE` statement. The results can be different if the conditions are specified in the `IF` statement. * Use `MERGE` to join the temp dataset back to the working dataset * **SAS file**: [qc_lb.sas](https://drive.google.com/file/d/1U2nyXysWQEeTda_othisFsS7zumsBzY1/view?usp=drive_link) * **Reference**: [Find first and last non-missing value by group](https://communities.sas.com/t5/SAS-Procedures/Find-first-and-last-non-missing-value-by-group/td-p/357207) ```sas! /*Create LBBLFL*/ data WORK.LBPREG_DM_LBBLFL; length LBBLFL $2; set WORK.LBPREG_DM; by USUBJID; where LBORRES not="" and LBDTC_datepart_n <=RFXSTDTC_datepart_n ; if last.USUBJID then LBBLFL="Y"; run; /*NOTE: The data set WORK.LBPREG_DM_LBBLFL has 12 observations and 56 variables.*/ /*Merge LBBLFL back*/ data work.LBPREG_DM_2; merge work.LBPREG_DM(in=T1) work.LBPREG_DM_LBBLFL(keep=USUBJID EventDate LBBLFL in=T2); by USUBJID EventDate; if T1; run; /*NOTE: The data set WORK.LBPREG_DM_2 has 24 observations and 56 variables.*/ ``` ::: --- ## String manipulation [PRXCHANGE Function](https://documentation.sas.com/doc/en/vdmmlcdc/8.1/lefunctionsref/n0r8h2fa8djqf1n1cnenrvm573br.htm) `PRXCHANGE( perl-regular-expression | regular-expression-id, times, source)` Required Arguments **perl-regular-expression** specifies a character constant, variable, or expression with a value that is a Perl regular expression. `s/pattern-to-look-for/pattern-to-replace/` **regular-expression-id** specifies a numeric variable with a value that is a pattern identifier that is returned from the PRXPARSE function. Restriction If you use this argument, you must also use the PRXPARSE function. **times** is a numeric constant, variable, or expression that specifies the number of times to search for a match and replace a matching pattern. Tip: If the value of times is –1, then matching patterns continue to be replaced until the end of source is reached. **source** specifies a character constant, variable, or expression that you want to search. [COMPRESS Function](https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ds2ref/n01p24mgwawxa2n1hy4gwe5t59xf.htm) `COMPRESS(variable, character-expression[, character-list-expression] [, modifier(s)])` Arguments **character-expression** specifies any valid expression that evaluates to a character expression and from which specified characters will be removed. Requirement: Enclose a literal string of characters in single quotation marks. Data type: CHAR, NCHAR, NVARCHAR, VARCHAR. See DS2 Expressions in SAS DS2 Programmer’s Guide **character-list-expression** specifies a variable or any valid expression that initializes a list of characters. By default, the characters in this list are removed from character-expression. Requirement: Enclose a literal string of characters in single quotation marks. Data type: CHAR, NCHAR, NVARCHAR, VARCHAR. See DS2 Expressions in SAS DS2 Programmer’s Guide. **modifier** specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COMPRESS function. Blanks are ignored. The following characters can be used as modifiers: * a or A adds alphabetic characters to the list of characters. * d or D adds digits to the list of characters. * k or K keeps the characters in the list instead of removing them. ### Remove special characters :::success * **Input variables**: RANDARM * **New variables**: EXTRT * **Core functions**: `PRXCHANGE()` `COMPRESS(x,,`kad`)` * **SAS file**: [qc_ex.sas](https://drive.google.com/file/d/12Hsa4dfHsSU7fbayPOHoLgsw3TcVsXE1/view?usp=drive_link) * **Reference**: [Remove all special characters](https://communities.sas.com/t5/SAS-Programming/remove-all-special-characters/td-p/586007) [The Amazing COMPRESS Function](https://blogs.sas.com/content/sgf/2021/07/27/the-amazing-compress-function/) ```sas! /*Remove special characters '(' and ')' from RANDARM*/ UPCASE(PRXCHANGE('s/$|$//',-1,RANDARM)) as EXTRT ``` ```sas! data _null_; x='/a/s/vvv122!!@#jfjfjf2222/-'; y=compress(x,'','kad'); /* The COMPRESS function uses 3 modifiers, 'k', 'a' and 'd'. This means keep the alphabetic characters and digits, remove anything else. Note that the modifiers must be in the third argument*/ put y; run; ``` ```sas! Valid = compress(Contains_Junk,'.,','kda'); /*because the third argument (modifiers) contains a 'k', the second argument is a list of characters to keep. In addition to periods and commas, you use modifiers to include all digits, uppercase and lowercase letters (the 'a' modifier - 'a' for alpha)*/ ``` ::: Example 1: Compressing Blanks Example 2: Compressing Uppercase Letters Example 6: Compressing Space Characters by Using a Modifier Example 7: Keeping Characters in the List by Using a Modifier ```sas! data test(overwrite=yes); dcl char a b c d y; method run(); a='AB C D '; b=compress(a); /*Remove blank from a*/ c=compress(a, 'A'); /*Remove uppercase letters from a*/ d=compress(a, '', 's'); /* Remove spaces from a by using the modifier “s”.*/ y=compress(a, 'ABCD', 'k'); /*Keep characters ABCD from a by using the modifier “k”.*/ end; enddata; run; ``` --- ### Extract string from a macro variable :::success * **Core functions**: `%substr()` * **SAS file**: [p21_merge_2nd_time_plus.sas](https://drive.google.com/file/d/1JpR8P1fRGSXic357fUHhBKcQoAZOVa8h/view?usp=drive_link) ```sas! %let ocv_unco = pinnacle21-report-2022-08-26T11-49-55-858.xlsx; %let latest_time = %substr("&ocv_unco.", %length("&ocv_unco.") - 28, 23); %put &latest_time.; ``` ::: --- ## Datetime ### Check existence of a date component in a datetime variable :::success * **Core functions**: `PRXMATCH()` `length()` * **SAS file**: [qc_pe.sas](https://drive.google.com/file/d/16BM4iV06fapuAbwb6Fke7Pw26JY3H5Uw/view?usp=drive_link) ```sas! /*--------------------- Create DOMAIN dataset ---------------------*/ /*Stack all subsets to create DOMAIN dataset*/ data work.PE_VLM_all; set work.unilateral_PEORRES_num WORK.CSNFCSNFSTS_FSSCTOT work.unilateral_PEORRES_char work.unilat_PEORRES_char_PESTAT work.unilat_PEORRES_num_PESTAT work.PESTAT_notdone; /*Create DOMAIN, RDOMAIN*/ DOMAIN="&domainName."; RDOMAIN=DOMAIN; /*Get dateparts of PEDTC*/ pattern_date='/\d{4}-\d{2}-\d{2}/'; if prxmatch(pattern_date,PEDTC)>0 and length(PEDTC)>=10 then PEDTC_datepart_n=datepart(input(PEDTC,B8601DT.)); else PEDTC_datepart_n=.; format PEDTC_datepart_n yymmdd10.; run; /*NOTE: The data set WORK.PE_VLM_ALL has 36896 observations and 43 variables.*/ ``` ::: --- ## Increment variables by 0.01 :::success * **New variables**: SDTM.SV.VISITNUM, SDTM.SV.VISIT, SDTM.SV.VISITDY, SDTM.SV.SVUPDES. The goal is to increment VISITNUM by 0.01 in unscheduled visits using VISITNUM of scheduled visits as ![](https://i.imgur.com/AdR1MDT.png) * Note that the input data needs to be presorted by SubjectId and value (EventDate) * **Core functions**: `BY` `RETAIN` `compress()` * **SAS file**: [qc_sv.sas](https://drive.google.com/file/d/1x8Bq4fuaHNTBTm6bUBAaCB68MvjPH7JC/view?usp=drive_link) ```sas! /*Sort dataset WORK._DATE_VALUES_3 by SubjectId and value (EventDate in YYYY-mm-dd)*/ %m_create_format_Codelists(m_Codelist=CL.SVUPDES, m_type=c); /*Create final VISITNUM, VISIT, VISITDY by modifying values of VISITNUM, VISIT, VISITDY from scheduled visits*/ data work._date_values_4(drop=_:); length VISIT $100 SVUPDES $200; set WORK._DATE_VALUES_3(rename=(VISITNUM=_VISITNUM VISIT=_VISIT VISITDY=_VISITDY)); by SubjectId value; retain VISITNUM VISIT VISITDY; /*Create final VISITNUM, VISIT, VISITDY for scheduled visits*/ if EventName not="UNSCHEDULED VISIT" and _VISITNUM not=. then do; VISITNUM=_VISITNUM; VISIT=_VISIT; VISITDY=_VISITDY; SVUPDES=""; end; /*Create final VISITNUM, VISIT, VISITDY for unscheduled visits*/ else if EventName ="UNSCHEDULED VISIT" and _VISITNUM =. then do; VISITNUM=VISITNUM+0.01; VISIT="UNSCHEDULED "||compress(put(VISITNUM,8.2)); VISITDY =.; SVUPDES=put(UNSCHREAS, $CL_SVUPDES.); end; run; ``` ::: --- ## Read a single CSV file :::success * **Core functions**: `PROC IMPORT` * **SAS file**: [qc_dm.sas](https://drive.google.com/file/d/1Yf9zABZLQgzN-L1TeUvRIhjUWi1H8Tlq/view?usp=drive_link) ```sas! /*Read duplicated subject file*/ proc import datafile="\\sas\repos\&studyid.\source\meta\&protnum._dupe_subs.csv" out=work.dup_subjects DBMS=CSV replace; getnames=YES; run; ``` ::: --- ## Read Excel spreadsheets :::success * **Core functions**: `LIBNAME libref XLSX "excel-file-path"` assigns a short name, libref, to an Excel file path. `SET libref.sheetname` reads the sheetname into SAS.`FILENAME filref 'file-path'` assigns a short fileref to a filepath. `PROC IMPORT` * libref is useful when referring to a SAS dataset in the form of libref.sas_data_name * You cannot concatenate libref and file name to make a file path * If the libref is pointed to a path that doesn't exist, SAS log gives a note rather than a warning ![](https://i.imgur.com/how7a1G.png) * **SAS files**: [qc_dm.sas](https://drive.google.com/file/d/1ZhshGGbyhn203Kg5FVraGx_nbgTtQlHF/view?usp=drive_link) [qc_pd.sas](https://drive.google.com/file/d/1kHHeC1kZYHk55MyE8SZw6koXC0jCFJv6/view?usp=drive_link) ```sas! /*Read the spreadsheet DATASETS from &protnum._sdtm.xlsx into SAS */ libname specs XLSX "\\sas\repos\&studyid.\source\meta\&protnum._sdtm.xlsx" access=readonly; /*Read in the Datasets sheet*/ data work.Datasets; set specs.DATASETS(where=(Domain="&domainName.")); run; libname specs clear; ``` ```sas! /*Assign protocol devidation filepath to fileref pdpath*/ filename pdpath "\\sas\repos\&studyid.\source\pdev\C-200-001 CTMS- Protocol Deviations - Study Level 08Aug2022-Current.xlsx"; /*Read protocol devidation file*/ proc import datafile=pdpath /*"&pdpath." */ out=work.raw_pdlist dbms=xlsx replace; getnames=YES; range="09Aug22-Present PDs$A1:P9"; options validvarname=v7; run; ``` ::: --- ## .sas7bat to .xpt file :::success * **Core functions**: `LIBNAME xport` `PROC COPY` * **SAS file**: [qc_dm.sas](https://drive.google.com/file/d/1ZhshGGbyhn203Kg5FVraGx_nbgTtQlHF/view?usp=drive_link) ```sas! /*Save .sas7bat file as a .xpt file Output file name is defined by libname Input file and its library are defined by PROC COPY*/ libname xport_1 xport "\\sas\repos\&studyid.\qc\sdtm\datasets\xpt\&xptFileName..xpt"; PROC COPY in = qsdtm out = xport_1 memtype = data; SELECT &datasetName.; RUN; /*Create a temporarry dataset for copying as xpt file Keep its name <= 8 characters*/ data work.&supplementalDomainName.(LABEL = &&&supplementalDomainName._datasetLabel); set qsdtm.&supplementalDatasetName.; run; proc contents data=work.&supplementalDomainName.;run; libname xport_2 xport "\\sas\repos\&studyid.\qc\sdtm\datasets\xpt\&supplementalXptFileName..xpt"; PROC COPY in = work out = xport_2 memtype = data; SELECT &supplementalDomainName.; RUN; ``` ::: --- ## Join a SDTM parent domain dataset and its supplemental (--SUPPxx) domain dataset :::success * **Core functions**: `PROC SORT` `PROC TRANSPOSE` `MERGE` * **Reference**: [Joining the SDTM and the SUPPxx Datasets](https://www.lexjansen.com/pharmasug/2019/PO/PharmaSUG-2019-PO-282.pdf) * How is a parent domain related to its supplemental domain? A parent domain contains variables in CDISC SDTM standard. Its --SUPP domain contains variables that are unique to a study protocol and are NOT part of a CDISC SDTM standard. Before the creation of the --SUPP domain, create new variables in the domain working dataset. Then transpose/reshpae these variables to two additional variables QNAM and QVAL where QNAM stores the value of variable names and QVAL stores the variable values. The --SUPP domain dataset is linked (e.g., can be merged back) to the parent domain via two variable IDVAR and IDVARVAL. ```sas! *Sort parent data; proc sort data=dm; by usubjid; run; *Set up SUPPQUAL for joining to parent; proc sort data=suppdm; by usubjid; run; /*Reshape suppdm from long to wide, one row per usubjid*/ proc transpose data=suppdm out=_sdm; by usubjid; id qnam; idlabel qlabel; /*Adds a label to the variable;*/ var qval; run; *Bring the two together; data _dm; merge dm _sdm (drop=_name_ _label_); by usubjid; run; ``` ::: --- ## Delete datasets by a list, prefixes, range :::success * **Core functions**: `PROC DATASETS delete` * **Reference**: [How to delete SAS data sets](https://communities.sas.com/t5/SAS-Communities-Library/How-to-delete-SAS-data-sets/ta-p/720671) ```sas! /*----------------------- Delete temporary datasets -----------------------*/ proc datasets library=qcmeta memtype=data nolist; delete _CRF_: ; /*Delete datasets prefixed with _CRF_*/ delete CRF_: ; /*Delete datasets prefixed with CRF_*/ delete _date_values_:; /*Delete datasets prefixed with _date_values_*/ delete datasets SDTM_DM; /*Delete datasets datasets, SDTM_DM*/ delete qc_: ; run; quit; ``` ::: --- ## Loop through multiple variables ### Create new variables by formatting exisitng variables with SAS arrays and do loop :::success * **Input variables**: IEINC, IEEXC * **Core functions**: `ARRAY` `do` `dim()` * **SAS file**: [qc_ie.sas](https://drive.google.com/file/d/1YgBQ4grWdDQ4f8pw0vX_6-RGlpbZdWl2/view?usp=drive_link) ```sas! /*Read CRF.IE*/ data work.CRF_IE; set CRF.IE; /*Format variables IEINC1-IEINC12 with $CL_IEINCF_IETESTCD.*/ /*An array that references existing variables to format*/ array IEINC_numeric {*} IEINC1-IEINC12; /*An array to create new variable names for the formatted variables*/ array IEINC_character {*} $ t_IEINC1-t_IEINC12; do i=1 to dim(IEINC_numeric); if IEINC_numeric{i} not=. then IEINC_character{i}=put(put(IEINC_numeric{i},CL_IEINCF.),$CL_IEINCF_IETESTCD.); end; /*Format variables IEEXC1-IEEXC25 with $CL_IEEXCF_IETESTCD.*/ /*An array that references existing variables to format*/ array IEEXC_numeric {*} IEEXC1-IEEXC25; /*An array to create new variable names for the formatted variables*/ array IEEXC_character {*} $ t_IEEXC1-t_IEEXC25; do j=1 to dim(IEEXC_numeric); if IEEXC_numeric{j} not=. then IEEXC_character{j}=put(put(IEEXC_numeric{i},CL_IEEXCF.),$CL_IEEXCF_IETESTCD.); end; drop i j IEINC: IEEXC:; run; ``` ::: --- ### Write repetitive IF-ELSE statements in data step using a SAS macro :::success * **Core functions**: `%do` `%sysfunc()` `countw()` `%scan()` `%STR()` `%end` Write one set of IF-ELSE statement per VieDoc numeric variable to apply format to them to create SAS character variables * **SAS file**: [qc_pe.sas](https://drive.google.com/file/d/16BM4iV06fapuAbwb6Fke7Pw26JY3H5Uw/view?usp=drive_link) ```sas! /*A macro to convert VieDoc numeric variables to SAS character variables (1) VieDoc numeric variables have DataType=Char in DOMAIN_VLM sheet. Including those with DataType=Num will result in an error that CL_*F format is not found. (2) Output new variables will have suffix "_t"*/ %macro VieDoc_numeric_to_SAS_character(input_variables=); /*Make macro variables created in the %do loop local*/ %local i var; %do i=1 %to %sysfunc(countw(&input_variables.)); %let var=%scan(&input_variables., &i., %STR( )); %put Converting VieDoc numeric variable &var. to a SAS character variable &var._t; if &var. not=. then &var._t=put(&var., CL_&var.F.); else &var._t=""; %end; %mend VieDoc_numeric_to_SAS_character; data work.CRF_BCVA; set CRF.BCVA; /*Restore text values for numeric variables*/ %VieDoc_numeric_to_SAS_character(input_variables=BCVAYN BCVARYN BCVALYN BCVARNO BCVALNO BCVAPINR BCVAPINL BCVALOGMARR BCVALOGMARL BCVASPR BCVASPL BCVALOSSR BCVALOSSL); run; ``` ```sas! /*A macro to convert VieDoc numeric variables to SAS numeric variables (1) VieDoc numeric variables have DataType=Char in DOMAIN_VLM sheet. Including those with DataType=Num will result in an error that CL_*F format is not found. (2) Output new variables will have suffix "_n"*/ %macro VieDoc_numeric_to_SAS_numeric(input_variables=); /*Make macro variables created in the %do loop local*/ %local i var; %do i=1 %to %sysfunc(countw(&input_variables.)); %let var=%scan(&input_variables., &i., %STR( )); %put Converting VieDoc numeric variable &var. to a SAS numeric variable &var._n; if &var. not=. then &var._n=input(put(&var., CL_&var.F.),8.); else &var._n=.; %end; %mend VieDoc_numeric_to_SAS_numeric; ``` ::: --- ### Assign first nonmissing value from multiple variables to a new variable :::success * **Input variables**: AEACT1-AEACT4, AEOTHACT1-AEOTHACT4 * **New variables**: AEACN, AEACNOTH * **Core functions**: `CMISS()` `COALESCEC()` * **SAS file**: [qc_ae.sas](https://drive.google.com/file/d/1ePH9ly4IZzaVh1SRJVYX9ElqBQcZGnLf/view?usp=drive_link) ```sas! /*------------- Create AEACN If one Action selected out of CRF.AE.AEACT1, CRF.AE.AEACT2 or CRF.AE.AEACT4: then CRF.AE.AEACT1, CRF.AE.AEACT2 or CRF.AE.AEACT4 depending on which of these ticked; If more than one Action selected: "MULTIPLE" -------------*/ /*Count number of non-missing numeric values across AEACT:*/ _n_nonmiss_AEACT=3 - CMISS(AEACT1, AEACT2, AEACT4); /*Assign values to AEACN*/ if _n_nonmiss_AEACT=0 then AEACN=""; else if _n_nonmiss_AEACT=1 then AEACN= COALESCEC(AEACT1, AEACT2, AEACT4); else if _n_nonmiss_AEACT >1 then AEACN= "MULTIPLE"; /*------------- Create AEACNOTH -------------*/ /*Count number of non-missing values across AEOTHACT: */ _n_nonmiss_AEOTHACT=4-CMISS(AEOTHACT1,AEOTHACT2,AEOTHACT3,AEOTHACT4); /*Assign values to AEACNOTH*/ if _n_nonmiss_AEOTHACT=0 then AEACNOTH=""; else if _n_nonmiss_AEOTHACT=1 then AEACNOTH=COALESCEC(AEOTHACT1, AEOTHACT2, AEOTHACT3, AEOTHACT4); else if _n_nonmiss_AEOTHACT >1 then AEACNOTH= "MULTIPLE"; ``` ::: --- ### Split a text variable into multiple columns of 200 characters without chopping a word :::success * **Input variables**: COVAL * **New variables**: COVAL, COVAL1, COVAL2 * **Core functions**: `compbl()` `scan()` `length()` `proc transpose` The goal is to split CO.COVAL into CO.COVAL, CO.COVAL1, CO.COVAL2. * **SAS file**: [qc_co.sas](https://drive.google.com/file/d/10TgTpdpfgEr-KNhraPxkjlEaSvi1xpkU/view?usp=drive_link) * **Reference**: [How to split a variable into 200 Character without chopping a word dynamically](https://communities.sas.com/t5/SAS-Programming/How-to-split-a-variable-into-200-Character-without-chopping-a/td-p/614640) ```sas! /*Split COVAL into multiples of 200 characters without chopping words*/ proc sort data=work.&domainName.; by USUBJID IDVARVAL; run; %let variable_to_split= COVAL; data work.tmp ;/*(drop=str i word)*/ set work.&domainName.; length nystr $200; /*The COMPBL function removes multiple blanks in a character string by translating each occurrence of two or more consecutive blanks into a single blank.*/ str = compbl(&variable_to_split.); do i = 1 to countw(str,' '); word = scan(str,i,' '); if length(nystr) + length(word) + 1 > 200 then do; output; nystr = word; end; else nystr = catx(' ',nystr,scan(str,i,' ')); end; if nystr ne '' then output; run; /*NOTE: The data set WORK.TMP has 10 observations and 14 variables.*/ /*Split string > 200 characters into multiple variables (COL1, COL2, COL3,...)*/ proc transpose data=tmp out=want (drop=_name_); by USUBJID IDVARVAL; var nystr; run; /*NOTE: There were 10 observations read from the data set WORK.TMP.*/ /*NOTE: The data set WORK.WANT has 8 observations and 5 variables.*/ /*Merge COL1-COL3 back to input dataset*/ proc sql; create table WORK.&domainName._2(drop=_:) as select a.*, b.* from WORK.&domainName.(drop=COVAL) as a left join WORK.WANT(rename=(USUBJID=_USUBJID IDVARVAL=_IDVARVAL COL1=COVAL COL2=COVAL1 COL3=COVAL2)) as b on a.USUBJID=b._USUBJID and a.IDVARVAL=b._IDVARVAL order by USUBJID, IDVARVAL; quit; /*NOTE: Table WORK.CO_2 created, with 8 rows and 12 columns.*/ ``` ::: :::success * **Input variables**: DSTERM * **New variables**: DSTERM, DSTERM1, DSTERM2 * **Core functions**: `compbl()` `scan()` `length()` `proc transpose` The goal is to split DS.DSTERM into CO.DS.DSTERM, DS.DSTERM1, and DS.DSTERM2. The input data is WORK.DSSTDTC_EPOCH_2. The variable to split is DSTERM. The newly created columns COL1 storing first 200 characters, COL2 storing second 200 characters and COL3 storing third 200 characters. A possible mistake is to create DSTERM setting its length to 200, which COL2 and COL3 won't be generated * **SAS file**: [qc_ds.sas](https://drive.google.com/file/d/1epJ28QdxE_s-3Zh9F4nQHyYYTvo7EckI/view?usp=drive_link) * **Reference**: [How to split a variable into 200 Character without chopping a word dynamically](https://communities.sas.com/t5/SAS-Programming/How-to-split-a-variable-into-200-Character-without-chopping-a/td-p/614640) ```sas! data work.tmp (drop=str i word); set WORK.DSSTDTC_EPOCH_2; length nystr $200; /*The COMPBL function removes multiple blanks in a character string by translating each occurrence of two or more consecutive blanks into a single blank.*/ str = compbl(DSTERM); do i = 1 to countw(str,' '); word = scan(str,i,' '); if length(nystr) + length(word) + 1 > 200 then do; output; nystr = word; end; else nystr = catx(' ',nystr,scan(str,i,' ')); end; if nystr ne '' then output; run; /*NOTE: The data set WORK.TMP has 729 observations and 38 variables.*/ /*Split string > 200 characters into multiple variables (COL1, COL2, COL3,...)*/ proc transpose data=tmp out=want (drop=_name_); by SubjectId DSSEQ; var nystr; run; /*NOTE: There were 729 observations read from the data set WORK.TMP.*/ /*NOTE: The data set WORK.WANT has 727 observations and 5 variables.*/ /*Merge COL1-COL3 back to input dataset*/ proc sql; create table WORK.DSSTDTC_EPOCH_DSTERM(drop=_:) as select a.*, b.* from WORK.DSSTDTC_EPOCH_2(drop=DSTERM) as a left join WORK.WANT(rename=(SubjectId=_SubjectId DSSEQ=_DSSEQ COL1=DSTERM COL2=DSTERM1 COL3=DSTERM2)) as b on a.SubjectId=b._SubjectId and a.DSSEQ=b._DSSEQ order by SubjectId, DSSEQ; quit; /*NOTE: Table WORK.DSSTDTC_EPOCH_DSTERM created, with 727 rows and 39 columns.*/ ``` ::: --- ## Create earliest and latest possible datetime for --STDTC variables ```sas! /*Set --STDTC to a macro variable*/ %let variable=DSSTDTC; /*Create earliest, latest possible datetime for --STDTC*/ data WORK._DS_DM_DSSPID_EPOCH; set WORK.DS_DM_DSSPID(rename=(&variable.=&variable._temp)); /*Remove -NK from --STDTC*/ &variable.= prxchange('s/-NK//',-1, &variable._temp); /*Create earliest, latest possible datetime in --STDTC*/ if length(&variable.)=4 then do; &variable._min=trim(&variable.)||"-01-01T00:00"; &variable._max=trim(&variable.)||"-12-31T23:59"; end; else if length(&variable.)=7 then do; &variable._min=input(cats(&variable.,"-01T00:00"),E8601DT17.); &variable._max=input(cats(put(intnx('month',input(strip(&variable.)||"-01",yymmdd10.),1)-1,E8601DA10.),"T23:59"),E8601DT17.); end; else if length(&variable.)=10 then do; &variable._min=trim(&variable.)||"T00:00"; &variable._max=trim(&variable.)||"T23:59"; end; else if length(&variable.)>10 then do; &variable._min=&variable.; &variable._max=&variable.; end; run; ``` --- ## Communication about variable derivation, discrepancy :::warning * **Variables**: SDTM.SE.TAETORD * **Issues**: SDTM.SE.TAETORD=3 email draft Hi Gina, Hope you are doing well. I have investigated why my QSDTM.QC_SE (440 obs) has one more obs than SDTM.SE (439 obs). The commentary in my program shows they were compared equal before. The obs that is not found in SDTM.SE is highlighted: ![](https://i.imgur.com/QmfvDNg.png) I have checked how Mel derived TAETORD in her se.sas as well as the spec for C-100. Mel inner joined SDTM.DM and SDTM.SE by USUBJID to bring ETCD into a working dataset. Then TAETORD is sourced from SDTM.TA by a left join of the dataset with SDTM.TA by ETCD=ETCD. That is quite different from my program that derives TAETORD in different subsets where each subset has 1 TAETORD value and stacks them as one dataset. Maybe my subset with TAETORD=3 is not correctly programmed on my side. Can you recall what variables are involved in the Condition to create TAETORD=3 as highlighted below? That is from the SE_Meta sheet. ![](https://i.imgur.com/BVv4xtC.png) Here is how I create TAETORD=3. First, SDTM.DM is left joined with SDTM.TA by ARMCD=ARMCD: ![](https://i.imgur.com/jkTr6IL.png) then a subset with TAETORD=3 is created by the where= clause as highlighted: ![](https://i.imgur.com/6Hwo3qY.png) I am not sure if the subsetting condition has done 100% of the condition in the spec. Can you take a look? I have attached the spec file, Mel's se.sas and my qc_se.sas ::: --- :::warning * **Files**: SDTM specification file * Issues: * Update `\\sas\repos\Surface_C-200-001\source\meta\C-200-001_sdtm.xlsx` and make a copy to `\\sas\repos\Surface_C-200-001\spec\sdtm\C-200-001_sdtm.xlsx`. Also make a copy to `https://greenlightclinical.sharepoint.com/:f:/r/teams/Stats/Projects/Noymed_sponsor_folder/Surface_C-200-001/Stats/1103_Analysis/110301_Data Definitions for A Datasets/SDTM/Specs/C-200-001_sdtm.xlsx` so that NoyMed has the latest. * When u send a version of it to the client, just before sending it, make a copy of it and call the copy v1.00 or v1.0. & if u want the date also, e.g. v1_2022-10-06 or whenever it is due to be sent. ::: ---