---
title: 'Tâm - Setup MIMIC - 03/01/20'
tags: BKHCMUT, Research, Cao Hoang Tru, Vo Thi Ngoc Chau
---
Setup MIMIC - 03/01/20
===
## Table of Contents
[TOC]
## Downloading MIMIC data to local PostGre
- Data files Link https://physionet.org/content/mimiciii/1.4/
- Download via command line & web browser -> too slow
- Download via Amazon s3: Access permission error (https://registry.opendata.aws/mimiciii/)
- Install https://docs.aws.amazon.com/cli/latest/userguide/install-macos.html
- configure: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration
```
AKIATUNWTO2R77CZSHNM
p2RnhKyc9zJndRa8Z+nmVOoxqedLXkSIjCrRxgbc
```
- Run:
```
/Users/macmac/bin/aws s3 sync s3://mimic-iii-physionet ~/../../../Volumes/2Drive/MIMICiii-1.4/AWS
```
error: `fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied`
- Using Amazon Athena (we don't use) https://aws.amazon.com/blogs/big-data/perform-biomedical-informatics-without-a-database-using-mimic-iii-data-and-amazon-athena/
- **Access via cloud** (https://mimic.physionet.org/gettingstarted/cloud/) granting access to Google Cloud account, then BigQuery can be used directly on the data on the cloud [demo here](https://mimic.physionet.org/tutorials/intro-to-mimic-iii-bq/). go to bigquery [here](https://console.cloud.google.com/bigquery?authuser=2&folder=&project=mimic-iii-hcmut&j=bq:US:bquxjob_22bfaa47_16f6bf3307e&page=queryresults)
- Download via [Google Cloud Storage Browser:](https://console.cloud.google.com/storage/browser/mimiciii-1.4.physionet.org?authuser=2) ~30 mins for large files (~4GB), download concurrently ~60 mins
- Đối với file downloads của Tâm, csv files download trực tiếp bị corrupted. Phải dùng .zip file và gunzip thì mới có file csv dùng được.
## Migrate PostGreSQL to SSD for more storage
- **finding current data directory:** `psql -U postgres`
- to move data to new directory in SSD
```BASH=
sudo rsync -av /Library/PostgreSQL/12/data ~/../../../Volumes/APFS/postgres-data
```
- Edit postgres.conf file in `/var/postgres/`. Change `data_directory` to new value
- Then stop the old server and start the new data server:
```BASH=
# Stop current data server
sudo -u postgres PG_CTL -D /Library/PostgreSQL/12/data stop
# Start new server
sudo -u postgres PG_CTL -D ~/../../../Volumes/APFS/postgres-data/data start
# See the status of old and new servers
sudo -u postgres PG_CTL -D /Library/PostgreSQL/12/data status
sudo -u postgres PG_CTL -D ~/../../../Volumes/APFS/postgres-data/data status
```
**We need to stop old server and start new server everytime machine restarts in order to use new data directory**
## Create MIMIC PostGres database
- Follow https://mimic.physionet.org/tutorials/install-mimic-locally-ubuntu/
- Create user mimicuser
`CREATE USER mimicuser WITH PASSWORD 'jw8s0F4' CREATEDB SUPERUSER;`
- `\dt *.*` allows to see the relations whereas just `\dt` will not show any relations because we are using mimic schema which is not public.
- Because there was an cpu error when loading the data into tables, my computer restarts and some tables are still empty. I created a custom script based on setup scripts to load remaining data into the tables.
- **Problems with load_data because of not UTF-8 encoding:**
>github issue link
>https://github.com/MIT-LCP/mimic-code/issues/662
>
>**I fixed the problem by running load data in Mac Safe Mode and prevent crashing**
```bash=
#Try to convert encoding using terminal
iconv -f utf-8 -t ascii INPUTEVENTS_CV_1.csv > INPUTEVENTS_CV.csv
iconv -f ascii -t utf-8 D_ICD_DIAGNOSES_2.csv > D_ICD_DIAGNOSES.csv
iconv -f ascii -t utf-8 D_ICD_PROCEDURES_2.csv > D_ICD_PROCEDURES.csv
iconv -f ascii -t utf-8 INPUTEVENTS_CV_2.csv > INPUTEVENTS_CV.csv
iconv -f ascii -t utf-8 INPUTEVENTS_MV_2.csv > INPUTEVENTS_MV.csv
iconv -f ascii -t utf-8 LABEVENTS_2.csv > LABEVENTS.csv
iconv -f ascii -t utf-8 MICROBIOLOGYEVENTS_2.csv > MICROBIOLOGYEVENTS.csv
iconv -f ascii -t utf-8 NOTEEVENTS_2.csv > NOTEEVENTS.csv
iconv -f ascii -t utf-8 OUTPUTEVENTS_2.csv > OUTPUTEVENTS.csv
iconv -f ascii -t utf-8 PATIENTS_2.csv > PATIENTS.csv
iconv -f ascii -t utf-8 PRESCRIPTIONS_2.csv > PRESCRIPTIONS.csv
iconv -f ascii -t utf-8 PROCEDUREEVENTS_MV_2.csv > PROCEDUREEVENTS_MV.csv
iconv -f ascii -t utf-8 PROCEDURES_ICD_2.csv > PROCEDURES_ICD.csv
iconv -f ascii -t utf-8 SERVICES_2.csv > SERVICES.csv
iconv -f ascii -t utf-8 TRANSFERS_2.csv > TRANSFERS.csv
mv D_ICD_DIAGNOSES.csv D_ICD_DIAGNOSES_3.csv
mv D_ICD_PROCEDURES.csv D_ICD_PROCEDURES_3.csv
mv INPUTEVENTS_CV.csv INPUTEVENTS_CV_3.csv
mv INPUTEVENTS_MV.csv INPUTEVENTS_MV_3.csv
mv LABEVENTS.csv LABEVENTS_3.csv
mv MICROBIOLOGYEVENTS.csv MICROBIOLOGYEVENTS_3.csv
mv NOTEEVENTS.csv NOTEEVENTS_3.csv
mv OUTPUTEVENTS.csv OUTPUTEVENTS_3.csv
mv PATIENTS.csv PATIENTS_3.csv
mv PRESCRIPTIONS.csv PRESCRIPTIONS_3.csv
mv PROCEDUREEVENTS_MV.csv PROCEDUREEVENTS_MV_3.csv
mv PROCEDURES_ICD.csv PROCEDURES_ICD_3.csv
mv SERVICES.csv SERVICES_3.csv
mv TRANSFERS.csv TRANSFERS_3.csv
mv INPUTEVENTS_CV.csv INPUTEVENTS_CV_3.txt
#Check file encoding
file D_ICD_DIAGNOSES.csv
file -I D_ICD_DIAGNOSES.csv
mv INPUTEVENTS_CV_3.csv INPUTEVENTS_CV.csv
mv INPUTEVENTS_MV_3.csv INPUTEVENTS_MV.csv
mv LABEVENTS_3.csv LABEVENTS.csv
mv MICROBIOLOGYEVENTS_3.csv MICROBIOLOGYEVENTS.csv
mv NOTEEVENTS_3.csv NOTEEVENTS.csv
mv OUTPUTEVENTS_3.csv OUTPUTEVENTS.csv
mv PATIENTS_3.csv PATIENTS.csv
mv PRESCRIPTIONS_3.csv PRESCRIPTIONS.csv
mv PROCEDUREEVENTS_MV_3.csv PROCEDUREEVENTS_MV.csv
mv PROCEDURES_ICD_3.csv PROCEDURES_ICD.csv
mv SERVICES_3.csv SERVICES.csv
mv TRANSFERS_3.csv TRANSFERS.csv
```
> Server character set https://www.postgresql.org/docs/current/multibyte.html#id-1.6.10.5.7
> https://www.postgresql.org/docs/9.3/multibyte.html
> https://stackoverflow.com/questions/4867272/invalid-byte-sequence-for-encoding-utf8
> copy with custom encoding
> https://stackoverflow.com/questions/4867272/invalid-byte-sequence-for-encoding-utf8
> http://pgdocptbr.sourceforge.net/pg82/multibyte.html
>
>Converting iconv
>https://www.tecmint.com/convert-files-to-utf-8-encoding-in-linux/
> fix_latin --strict-utf8
> https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8
>
> us-ascii to utf-16 to utf-8
> iso-8859-1
> https://stackoverflow.com/questions/11303405/force-encode-from-us-ascii-to-utf-8-iconv
>
`iconv -f us-ascii -t utf-16 TRANSFERS_2.csv > TRANSFERS_4.csv
`
`iconv -f utf-16be -t utf-8 TRANSFERS_4.csv > TRANSFER.csv
`
- Seeing non-ASCII character
`pcregrep --color='auto' -n "[\x80-\xFF]" TRANSFERS.csv
`
- **Deleting non-ASCII character**
> https://unix.stackexchange.com/questions/475548/removing-all-non-ascii-characters-from-a-workflow-file
>
`LC_ALL=C tr -dc '\0-\177' <TRANSFERS_5.csv >TRANSFERS.csv
`
- **Converting to UTF-8 using Java**
https://dzone.com/articles/the-programmers-way-to-convert-excel-to-csv
https://www.mkyong.com/java/how-to-write-utf-8-encoded-data-into-a-file-java/
- **Convertin using C++**
https://www.example-code.com/vcpp/charset_convert_file_from_utf8_to_ansi.asp
https://rupinderjeetkaur.wordpress.com/2014/06/20/run-a-cc-program-on-terminal-using-gcc-compiler/
```bash=
sudo g++ -o converter chilkatConverter.cpp
```
Errors:
`chilkatConverter.cpp:1:10: error: 'CkCharset.h' file not found with <angled>
include; use "quotes" instead`
`ld: symbol(s) not found for architecture x86_64`
https://medium.com/@m.muizzsuddin_25037/error-ld-symbol-not-found-for-architecture-x86-64-a5e5b648ffc
==================================================
```bash=
./converter
```
==================================================
unlocking global:
`sudo gcc –o unlock unlock.c`
error:
```
clang: error: no such file or directory: '–o'
clang: error: no such file or directory: 'unlock'
```
**I'm creating a new db named 'mimiciii' with schema 'mimic' and load data again**
`psql 'dbname=mimiciii user=mimicuser options=--search_path=mimic' -f postgres_load_data.sql -v mimic_data_dir='mimic-iii-clinical-database-1.4'`
but my computer crashed and restarted again. Then I ran into the same error of UTF-8 encoding when trying to fill the remaining tables
**I fixed the problem by running load data in Mac Safe Mode and prevent crashing**
## Tìm hiểu MIMIC text
- **Tổng quát MIMIC database:**
- MIMIC-III v1.4
- Gồm 40 tables và tổng cộng 728,556,685 rows và 534 columns
- Thông tin chi tiết về schema tại [đây](https://mit-lcp.github.io/mimic-schema-spy/index.html)
- Thông tin thật của hơn 40 ngàn bệnh nhân tại Beth-Isreal Deaconess Medical Center critical care unit
- Thông tin bao gồm demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (both in and out of hospital).
- **MIMIC text containing caregivers' notes:**
- [mimic.mimiciii.noteevents](https://mit-lcp.github.io/mimic-schema-spy/tables/noteevents.html): column text contains note as text string
- Sampling command for mimiciii.noteevents text column
- all columns with header:`\COPY (SELECT * FROM mimiciii.noteevents LIMIT 1 OFFSET 107) TO '~/../../Volumes/APFS/MIMIC/sample_noteeventstext_107.csv' csv header;`
- only text column with header : `\COPY (SELECT text FROM mimiciii.noteevents LIMIT 5 OFFSET 107) TO '~/../../Volumes/APFS/MIMIC/sample_noteeventstext_107.csv' csv header;`
- exporting all column text ` \COPY (SELECT text FROM mimiciii.noteevents) TO '~/../../Volumes/APFS/MIMIC/all_noteeventstext.csv' csv header;`
- Table mimic.mimiciii.chartevents
-
- **Label của MIMIC text:**
## References
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `Sentiment Analysis` `MIMIC clinical text` `BKHCMUT`