Mimic-code: Advice on how to run complicated SQL scripts passed on from other projects/studies?

Created on 2 Sep 2017 · 4Comments · Source: MIT-LCP/mimic-code

Hi, I'm new to both MIMIC-3 and postgeSQL/pgAdmin 4.

I've been working through the Cohort selection tutorial notebook, and think I understand the basics now.

I've been given two "blobs" of SQL queries, one used for cohort defintion, and another blob used for subsequent data extraction from a previous project. I've tried running separate blocks of queries from both "blobs" in pgAdmin4, and I think I understand bits of them.

I was simply wondering how I could run it all to output tables in csv format for each ICU encounter of a patient?

My end task would then be to extract 47 predefined features from each of those csv files, (and some other preprocessing), and then to join them into a multivariate time series, which I should be able to do in Matlab or Python using pandas.

The dataset for the study I'm trying to reproduce is detailed in sections 8.1 and 8.2 of this paper

I guess this perhaps might not be the right place to ask for advice like this? Thus is there any other place you could recommend for novices wanting to learn how to run SQL scripts inherited from other projects/studies? I expect this is quite a common question/task but I couldn't find any place that helped?

Thanks very much for your help!

Source

AjayTalati

Most helpful comment

Great! Good luck!

alistairewj on 7 Sep 2017

👍2

All 4 comments

Well here's a couple of tips for totally newbs who might have the same problem,

To run multiple blocks of queries, using the query tool in pgAdmin4, try finishing each block with a semi colon;

The problem is it seems only the last query is actual executed in the Data Output Panel - I guess you need to use a UNION or JOIN clause to merge them somehow?

cohort_definition

To save your final table, i.e. to export from Postgres to CSV you can use the COPY statement, see #214

AjayTalati on 2 Sep 2017

I don't think I can provide a general solution but perhaps I can point you in the right direction. My favorite way to work is to write modular SQL scripts which create materialized views of the data for particular concepts (e.g. ventilation). I then combine all of these views together at the end to make one big table, and I output that to CSV or read that directly into Python. If you are looking to learn SQL then I'm sure there are many tutorials online to help with that. Particularly I would read up about materialized views as they are very useful for creating intermediate tables which you can then use later (I think that's what you need in your last question).

If you look at the aline subfolder (https://github.com/MIT-LCP/mimic-code/tree/master/notebooks/aline) you can see an example of a fully reproducible clinical study. I would recommend doing something like what's done in that folder. There are a bunch of modular SQL files which generate underlying tables - you can see that I first generate a "cohort" table aline_cohort.sql - which says "these are the icustay_ids I am interested in". Then I run a number of other scripts to generate concepts for these icustay_ids. Finally, the notebook extracts all the data from these tables (in the aline.ipynb @ [7]). The notebook has gone a step further and actually runs all the above queries directly from Python. At the very least it should give you an idea of how you can build up a cohort/dataset from multiple SQL queries. I hope that helps!

alistairewj on 5 Sep 2017

👍1

Hi Alistair @alistairewj , thanks a lot for this great help it's much appreciated :+1:

I'm investing some time in going through the indwelling arterial catheter study (aline study), and your sepsis3-mimic notebooks.

Bit of a steep learning curve, but it's starting to make sense :) The Secondary Analysis of Electronic Health Records is proving to be really helpful too :)

AjayTalati on 7 Sep 2017

👍1

Great! Good luck!

alistairewj on 7 Sep 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings