Mimic-code: Could not stat file chartevents.csv unknown error

Created on 29 Oct 2018  ·  25Comments  ·  Source: MIT-LCP/mimic-code

Prerequisites

When I run Postgres_load_data script, first three table are loaded and after that I got message: could not stat file CHARTEVENTS.csv: unknown error. Is anyone has this situation and can help.

Most helpful comment

Okay, the could not stat file "CHARTEVENTS.csv": Unknown error is actually a bug in PostgreSQL 11. Under the hood it makes a call to fstat() to make sure the file is not a directory, and unfortunately fstat() is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.

The best workaround is to keep the files compressed (i.e. keep them as .csv.gz files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/

The brief version of above is that you keep the .csv.gz files, you add the 7zip binary to your windows environment path, and then you call the postgres_load_data_7zip.sql file to load in the data. You can use the postgres_checks.sql file after everything to make sure you loaded in all the data correctly.

edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!

All 25 comments

Have you checked the integrity of your copy of chartevents.csv using the checksum files provided on the download page for the project? Perhaps it was corrupted during download or decompression.

Yes, I used command md5 checksum_md5_zipped.txt and everything is OK with all tables...

I also tried with zipped data and run postgres_load_data script_7zip. In that case I get: unquoted newline found in data. Hints: use quoted CSV field to represent new line.

I also checked md5 checksum_md5_unzipped.txt and everything is ok.

It sounds as though there is a mismatch between the script you're running and the data you have. I'd make sure:

  1. All of the files are in the same directory
  2. All of the files have the same file extension; e.g. they are all .csv.gz
  3. You are running the postgres_load_data_7zip.sql file either (i) from the same folder or (ii) after configuring the mimic_data_dir to point to the data directory.

Past that it's really hard to debug remotely without more info like a screenshot of your folder setup, your system information, the exact commands you ran and the exact error message.

Hello,

Thank you for your answer.

  1. All files are in the same directory
  2. All files have the same file extension csv
  3. I am running the posgres_load_data.sql file after configuring the mimic_data_dir to point to the data directory.
    Here si my exact commands and error I got.
    step1
    step2
    system_information

Great that's very helpful, thanks for the additional info. I think it's as simple as the file not being in the folder. Can you double check that your folder C:/Users/Lejla/Desktop/MIMICIII has the CHARTEVENTS.csv file?

It may be that you tried to extract all the compressed files, but it failed for chartevents and so you only have a .csv.gz file (reasons could be because the extracted file is 33GB and you ran out of space, or your file system is FAT32 (!), or who knows). In that case, you may want to edit the load script to load it directly from .csv.gz. You can do that by replacing:

\copy CHARTEVENTS from 'CHARTEVENTS.csv' delimiter ',' csv header NULL ''

with

\copy CHARTEVENTS from PROGRAM '7z e -so CHARTEVENTS.csv.gz' delimiter ',' csv header NULL ''

Thank you very much for answer. I tried this time to work with zip file, and run script for it. This time I got other
zip_file
message... Perhaps it will help.

Do you mind showing the contents of the directory?

I do not mind.here is content of my folder
directory

Okay, the could not stat file "CHARTEVENTS.csv": Unknown error is actually a bug in PostgreSQL 11. Under the hood it makes a call to fstat() to make sure the file is not a directory, and unfortunately fstat() is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.

The best workaround is to keep the files compressed (i.e. keep them as .csv.gz files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/

The brief version of above is that you keep the .csv.gz files, you add the 7zip binary to your windows environment path, and then you call the postgres_load_data_7zip.sql file to load in the data. You can use the postgres_checks.sql file after everything to make sure you loaded in all the data correctly.

edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!

Hello,
Thank you for detail explanation. I installed PostgreSQL 10.5 and now process is running. I think it will take a lot of time to load all table but I do not get "Unknown error" anymore. Thank you very much for all help.

Great!

Okay, the could not stat file "CHARTEVENTS.csv": Unknown error is actually a bug in PostgreSQL 11. Under the hood it makes a call to fstat() to make sure the file is not a directory, and unfortunately fstat() is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.

The best workaround is to keep the files compressed (i.e. keep them as .csv.gz files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/

The brief version of above is that you keep the .csv.gz files, you add the 7zip binary to your windows environment path, and then you call the postgres_load_data_7zip.sql file to load in the data. You can use the postgres_checks.sql file after everything to make sure you loaded in all the data correctly.

edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!

Using PostgreSQL 10.11 helped me... thanks

Great that's very helpful, thanks for the additional info. I think it's as simple as the file not being in the folder. Can you double check that your folder C:/Users/Lejla/Desktop/MIMICIII has the CHARTEVENTS.csv file?

It may be that you tried to extract all the compressed files, but it failed for chartevents and so you only have a .csv.gz file (reasons could be because the extracted file is 33GB and you ran out of space, or your file system is FAT32 (!), or who knows). In that case, you may want to edit the load script to load it directly from .csv.gz. You can do that by replacing:

\copy CHARTEVENTS from 'CHARTEVENTS.csv' delimiter ',' csv header NULL ''

with

\copy CHARTEVENTS from PROGRAM '7z e -so CHARTEVENTS.csv.gz' delimiter ',' csv header NULL ''

Thanks, this worked for me:
\copy my_table_name from program 'cmd /c type input_data.csv' delimiter ',' csv header;
input_data.csv like 11GB size.

The issue with "can't copy large files" is up for 11 and 12 versions. But for 10 is ok. How to override it without compressing of a data-files, but maybe to upsert/swap some Postgresql-program files from v.10 to v 11 and 12?
Workaround:
copy t(c,d) from program 'cmd /c "type x:\pathto\file.txt"' with (format text);
-is pretty slow for my needs. I need speed of default Copy command

You could consider using other command line tools to split the file into multiple files, and then loading the individual files one at a time. On unix systems this can be done using split and you could install the GNU coreutils for Windows to use it.

I think I have encountered the same problem as you, but I am using the very new version 12. Is there any way to solve it? Use compressed files?

Yes, if I recall correctly the compressed files are < 4 GB and you avoid this error by using the compressed load scripts (7z or gzip).

OK, I will try this method now, thank you very very much for your reply

So, no workaround WITHOUT using compressing or splitting at all? Usage of 10's version of COPY command of Postgresql for 11, 12 engine?
As I mentioned:
I need speed of default Copy command but for large files + 12's version
and this is vital for my needs.

Well, PostgreSQL is open source, so you're welcome to try and contribute a fix yourself :)

Here is the relevant discussion: https://www.postgresql.org/message-id/20181104000405.GA1743%40paquier.xyz

Otherwise you have the three workarounds proposed in this thread (change version, use compressed files, split the file into multiple parts). I'm sure there are other workarounds too.

Isn't it obvious to migrate working part of the code of v. 10's of COPY functionality into 11 and 12? Or it is so hardcoded, that cause crash for all? :)

@ghYura this is a community maintained resource, so if you have suggestions for improving the codebase then I'd suggest making a pull request.

I was getting the error while loading the CSVs into the tables in both the 12.X and 13.X versions but it works like a charm in the PostgreSQL version 10.15. Thanks, everyone for the help :)

Was this page helpful?
0 / 5 - 0 ratings