Mimic-code: MIMIC Waveforms

Created on 10 Mar 2016  ·  13Comments  ·  Source: MIT-LCP/mimic-code

On the MIMIC-II query builder there were a couple of tables related to the MIMIC waveform database. Is this something that will be implemented in MIMIC-III?

Also, are there any plans to update the waveform database with more matched patients and new waveforms?

Most helpful comment

Just a quick update: we are pleased to say that a new batch of matched waveforms are being uploaded to PhysioNet right now (~10k patients in total). Once the waveforms are uploaded and checked, they will be made available for analysis.

All 13 comments

There are plans to update the matched database. The matching process is still ongoing.

Regarding the waveform tables, I'm not convinced that was the simplest method of distributing the matches. While we will release a map of some form, it may not be in the form of relative database tables.

I will leave this issue open for now and re-address it when there is an update about the waveforms.

I've attached a sample of matched headers for MIMIC-III patients here: if you have time, could you comment on whether this is a useful format, and whether you think additional information (HADM_ID, ICUSTAY_ID) would make things easier. See here for how to use matched waveform headers: http://physionet.org/physiobank/database/mimic2wdb/matched/

We do not currently plan to add tables to the MIMIC-III clinical database to match to the waveforms, but we do plan on releasing headers, such as those in the above file.

Thanks Alistair, I think the most important thing for the header would be the ICUSTAY_ID, as that indicates when the patient was admitted to the hospital. The current date listed in the headers is when the actual recording starts as opposed to the date of ICU admission. So if we have the ICUSTAY_ID, I should be able to link the rest of the patient data from there.

Could there be any cases where there is a recording but no ICUSTAY_ID associated with it?

Yes, there are. ICUSTAY_ID and the waveform records are collected independently. We have to map them back and that's not always trivial. There is a host of issues that can happen (different clocks, waveform records with erroneous medical record numbers, alignment issues, ...). Also, minor correction, the ICUSTAY_ID starts when the patient enters the ICU, not the hospital. The HADM_ID is associated with the hospital.

From my calculations around 73% of records have an ICUSTAY_ID, and 87% have an HADM_ID.

Here's a map of the above headers to ICUSTAY_ID/HADM_ID: mimic-iii-matched-waveforms-sample.xlsx

Hi,

I work with @parisni at APHP on MIMIC3 data.
I just found this csv file : https://physionet.org/physiobank/database/mimic3wdb/matched/matched_waveform_info.csv and I would like to know if it is the definitive version of the matches between the waveforms and the HADM_ID/ICUSTAY_ID? Also, can you explain what are the 'hadm_overlap', 'icustay_overlap', 'rih' and 'rii' columns?

The page https://mimic.physionet.org/mimicdata/waveforms/ indicates that the work is not finished yet but it seems to be finished.

In the issue #166, @tompollard states that "The waveform database for MIMIC-III has not yet been released, but we are working on it.", however, it seems to be available at /mimic3wdb.

Thanks! :)

Thanks for highlighting this @Dubrzr. Essentially @alistairewj created a header file to match previously released waveforms to the MIMIC-III clinical data, but no additional waveforms have been released yet. We'll update documentation etc to clarify this point.

Thanks for your answer and also for your work! :D

I am working on getting all the data in the .hea header files to put it in a database and I would like to know if it could be interesting to merge this work in this repository.

It works like this:

  1. Download all .hea files from Physionet into a local directory:
mimic3wdb/
  s00020/
    3544749_0001.hea
    3544749_0002.hea
    3544749_0003.hea
    3544749_0004.hea
    3544749_0005.hea
    3544749_0006.hea
    3544749_0007.hea
    3544749_0008.hea
    3544749_layout.hea
    s00020-2183-04-28-17-47.hea
    s00020-2183-04-28-17-47n.hea
  s00033/
    ....
  ....

  1. Download the matched_waveform_info.csv to get information about each record
  2. Extract all information from all .hea files (each sxxxxx-yyyy-mm-dd-hh-mm{n}.hea file corresponds to one record, and each file listed in this header corresponds to one entry)
  3. Write metadata from the csv file and the .hea files to two separated new csv files:

    • wfr.csv which contains one row by record

    • wfe.csv which contains one row by entry

> wfr.csv: record_id, subject_id, starttime, endtime, starting_hadm, ending_hadm, starting_icustay, ending_icustay, hadmmatch, icumatch, rih, rii, hadm_overlap, icustay_overlap, comments
> wfe.csv: record_id, type, segment_index, start_datedatetime, end_datedatetime, nsamp, nsig, fs, fmt, sampsperframe, skew, byteoffset, gain, units, baseline, initvalue, signame, comments

My scripts are available here: https://github.com/Dubrzr/mimic3-scripts

If you are interested in the resulting files, ask me.

Hi,

While exploring the data gathered with my script, I found erroneous dates in header files.

Only headers of numerics (s*n.hea) have this problem, for example, in the following file https://physionet.org/physiobank/database/mimic3wdb/matched/s00052/s00052-2191-01-10-02-21n.hea, the date is 14/03/3036 while the filename indicates that the date is 10/01/2191.

There are 888 numerics headers with this problem.
For the files concerned, can I assume that the date in the filename is the correct one? It seems to be concordant with the admission table.

There are also header files that are totally wrong:

You can see all the files with those problems here: https://gist.github.com/Dubrzr/6a22ae48980a549cc5883f3750ec0578

The script that generated this output is here: https://github.com/Dubrzr/mimic3-scripts/blob/master/headers_checker.py

Thanks!

Thanks for the bug report. I will be fixing the data later today - it was a sloppy regex! The date in the filename is the correct one. I'll post again when the data is updated on PhysioNet.

Regarding the crazy years, there are four of them to my knowledge:

  • s27446/s27446-8838-01-26-18-03
  • s27446/s27446-8838-01-26-18-03n
  • s29799/s29799-8921-03-11-17-16
  • s29799/s29799-8921-03-11-17-16n

No idea why the years are ridiculous. Probably a bad setting on a monitor. I would just exclude them like you're doing.

The matched header files on PhysioNet should be updated. Specifically, you should only need to redownload the s#####*.hea files. Let me know if you succeed with your next iteration of the script!

Regarding your scripts, I do think they'd be of interest to the community, but we'd have to think about where best to put them. For now I would tag your repository with mimic-iii and physionet which should help some.

Just a quick update: we are pleased to say that a new batch of matched waveforms are being uploaded to PhysioNet right now (~10k patients in total). Once the waveforms are uploaded and checked, they will be made available for analysis.

This is a super-exciting announcement! Thanks a lot for both of your work!

@bemoody and @cx1111 are the guys to thank for this - we'll pass on your praise!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AjayTalati picture AjayTalati  ·  4Comments

JohannesWiesner picture JohannesWiesner  ·  4Comments

postgres-newbie picture postgres-newbie  ·  22Comments

lmockus picture lmockus  ·  27Comments

mornin picture mornin  ·  11Comments