Hi, I'm new to both MIMIC-3 and postgeSQL/pgAdmin 4.
I've been working through the Cohort selection tutorial notebook, and think I understand the basics now.
I've been given two "blobs" of SQL queries, one used for cohort defintion, and another blob used for subsequent data extraction from a previous project. I've tried running separate blocks of queries from both "blobs" in pgAdmin4, and I think I understand bits of them.
I was simply wondering how I could run it all to output tables in csv
format for each ICU encounter of a patient?
My end task would then be to extract 47 predefined features from each of those csv files, (and some other preprocessing), and then to join them into a multivariate time series, which I should be able to do in Matlab or Python using pandas.
The dataset for the study I'm trying to reproduce is detailed in sections 8.1 and 8.2 of this paper
I guess this perhaps might not be the right place to ask for advice like this? Thus is there any other place you could recommend for novices wanting to learn how to run SQL scripts inherited from other projects/studies? I expect this is quite a common question/task but I couldn't find any place that helped?
Thanks very much for your help!
Well here's a couple of tips for totally newbs who might have the same problem,
query tool
in pgAdmin4, try finishing each block with a semi colon;The problem is it seems only the last query is actual executed in the Data Output Panel
- I guess you need to use a UNION
or JOIN
clause to merge them somehow?
COPY
statement, see #214 I don't think I can provide a general solution but perhaps I can point you in the right direction. My favorite way to work is to write modular SQL scripts which create materialized views of the data for particular concepts (e.g. ventilation). I then combine all of these views together at the end to make one big table, and I output that to CSV or read that directly into Python. If you are looking to learn SQL then I'm sure there are many tutorials online to help with that. Particularly I would read up about materialized views as they are very useful for creating intermediate tables which you can then use later (I think that's what you need in your last question).
If you look at the aline
subfolder (https://github.com/MIT-LCP/mimic-code/tree/master/notebooks/aline) you can see an example of a fully reproducible clinical study. I would recommend doing something like what's done in that folder. There are a bunch of modular SQL files which generate underlying tables - you can see that I first generate a "cohort" table aline_cohort.sql - which says "these are the icustay_id
s I am interested in". Then I run a number of other scripts to generate concepts for these icustay_id
s. Finally, the notebook extracts all the data from these tables (in the aline.ipynb @ [7]). The notebook has gone a step further and actually runs all the above queries directly from Python. At the very least it should give you an idea of how you can build up a cohort/dataset from multiple SQL queries. I hope that helps!
Hi Alistair @alistairewj , thanks a lot for this great help it's much appreciated :+1:
I'm investing some time in going through the indwelling arterial catheter study (aline study), and your sepsis3-mimic notebooks.
Bit of a steep learning curve, but it's starting to make sense :) The Secondary Analysis of Electronic Health Records is proving to be really helpful too :)
Great! Good luck!
Most helpful comment
Great! Good luck!