When I run Postgres_load_data script, first three table are loaded and after that I got message: could not stat file CHARTEVENTS.csv: unknown error. Is anyone has this situation and can help.
Have you checked the integrity of your copy of chartevents.csv
using the checksum files provided on the download page for the project? Perhaps it was corrupted during download or decompression.
Yes, I used command md5 checksum_md5_zipped.txt and everything is OK with all tables...
I also tried with zipped data and run postgres_load_data script_7zip. In that case I get: unquoted newline found in data. Hints: use quoted CSV field to represent new line.
I also checked md5 checksum_md5_unzipped.txt and everything is ok.
It sounds as though there is a mismatch between the script you're running and the data you have. I'd make sure:
.csv.gz
Past that it's really hard to debug remotely without more info like a screenshot of your folder setup, your system information, the exact commands you ran and the exact error message.
Hello,
Thank you for your answer.
Great that's very helpful, thanks for the additional info. I think it's as simple as the file not being in the folder. Can you double check that your folder C:/Users/Lejla/Desktop/MIMICIII
has the CHARTEVENTS.csv
file?
It may be that you tried to extract all the compressed files, but it failed for chartevents and so you only have a .csv.gz
file (reasons could be because the extracted file is 33GB and you ran out of space, or your file system is FAT32 (!), or who knows). In that case, you may want to edit the load script to load it directly from .csv.gz
. You can do that by replacing:
\copy CHARTEVENTS from 'CHARTEVENTS.csv' delimiter ',' csv header NULL ''
with
\copy CHARTEVENTS from PROGRAM '7z e -so CHARTEVENTS.csv.gz' delimiter ',' csv header NULL ''
Thank you very much for answer. I tried this time to work with zip file, and run script for it. This time I got other
message... Perhaps it will help.
Do you mind showing the contents of the directory?
I do not mind.here is content of my folder
Okay, the could not stat file "CHARTEVENTS.csv": Unknown error
is actually a bug in PostgreSQL 11. Under the hood it makes a call to fstat()
to make sure the file is not a directory, and unfortunately fstat()
is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.
The best workaround is to keep the files compressed (i.e. keep them as .csv.gz
files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/
The brief version of above is that you keep the .csv.gz
files, you add the 7zip binary to your windows environment path, and then you call the postgres_load_data_7zip.sql
file to load in the data. You can use the postgres_checks.sql
file after everything to make sure you loaded in all the data correctly.
edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!
Hello,
Thank you for detail explanation. I installed PostgreSQL 10.5 and now process is running. I think it will take a lot of time to load all table but I do not get "Unknown error" anymore. Thank you very much for all help.
Great!
Okay, the
could not stat file "CHARTEVENTS.csv": Unknown error
is actually a bug in PostgreSQL 11. Under the hood it makes a call tofstat()
to make sure the file is not a directory, and unfortunatelyfstat()
is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.The best workaround is to keep the files compressed (i.e. keep them as
.csv.gz
files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/The brief version of above is that you keep the
.csv.gz
files, you add the 7zip binary to your windows environment path, and then you call thepostgres_load_data_7zip.sql
file to load in the data. You can use thepostgres_checks.sql
file after everything to make sure you loaded in all the data correctly.edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!
Using PostgreSQL 10.11 helped me... thanks
Great that's very helpful, thanks for the additional info. I think it's as simple as the file not being in the folder. Can you double check that your folder
C:/Users/Lejla/Desktop/MIMICIII
has theCHARTEVENTS.csv
file?It may be that you tried to extract all the compressed files, but it failed for chartevents and so you only have a
.csv.gz
file (reasons could be because the extracted file is 33GB and you ran out of space, or your file system is FAT32 (!), or who knows). In that case, you may want to edit the load script to load it directly from.csv.gz
. You can do that by replacing:
\copy CHARTEVENTS from 'CHARTEVENTS.csv' delimiter ',' csv header NULL ''
with
\copy CHARTEVENTS from PROGRAM '7z e -so CHARTEVENTS.csv.gz' delimiter ',' csv header NULL ''
Thanks, this worked for me:
\copy my_table_name from program 'cmd /c type input_data.csv' delimiter ',' csv header;
input_data.csv like 11GB size.
The issue with "can't copy large files" is up for 11 and 12 versions. But for 10 is ok. How to override it without compressing of a data-files, but maybe to upsert/swap some Postgresql-program files from v.10 to v 11 and 12?
Workaround:
copy t(c,d) from program 'cmd /c "type x:\pathto\file.txt"' with (format text);
-is pretty slow for my needs. I need speed of default Copy command
You could consider using other command line tools to split the file into multiple files, and then loading the individual files one at a time. On unix systems this can be done using split
and you could install the GNU coreutils for Windows to use it.
I think I have encountered the same problem as you, but I am using the very new version 12. Is there any way to solve it? Use compressed files?
Yes, if I recall correctly the compressed files are < 4 GB and you avoid this error by using the compressed load scripts (7z or gzip).
OK, I will try this method now, thank you very very much for your reply
So, no workaround WITHOUT using compressing or splitting at all? Usage of 10's version of COPY command of Postgresql for 11, 12 engine?
As I mentioned:
I need speed of default Copy command but for large files + 12's version
and this is vital for my needs.
Well, PostgreSQL is open source, so you're welcome to try and contribute a fix yourself :)
Here is the relevant discussion: https://www.postgresql.org/message-id/20181104000405.GA1743%40paquier.xyz
Otherwise you have the three workarounds proposed in this thread (change version, use compressed files, split the file into multiple parts). I'm sure there are other workarounds too.
Isn't it obvious to migrate working part of the code of v. 10's of COPY functionality into 11 and 12? Or it is so hardcoded, that cause crash for all? :)
@ghYura this is a community maintained resource, so if you have suggestions for improving the codebase then I'd suggest making a pull request.
I was getting the error while loading the CSVs into the tables in both the 12.X and 13.X versions but it works like a charm in the PostgreSQL version 10.15. Thanks, everyone for the help :)
Most helpful comment
Okay, the
could not stat file "CHARTEVENTS.csv": Unknown error
is actually a bug in PostgreSQL 11. Under the hood it makes a call tofstat()
to make sure the file is not a directory, and unfortunatelyfstat()
is a 32-bit program which can't handle large files like chartevents. I tested the build on Windows with PostgreSQL 10.5 and I didn't get this error so I think it's fairly new.The best workaround is to keep the files compressed (i.e. keep them as
.csv.gz
files) and use 7zip to load in the data directly from compressed files. In testing this seemed to still work. There is a pretty detailed tutorial on how to do this here: https://mimic.physionet.org/tutorials/install-mimic-locally-windows/The brief version of above is that you keep the
.csv.gz
files, you add the 7zip binary to your windows environment path, and then you call thepostgres_load_data_7zip.sql
file to load in the data. You can use thepostgres_checks.sql
file after everything to make sure you loaded in all the data correctly.edit: For your later error, where you are using this 7zip approach, I'm not sure why it's not loading. Try redownloading just the ADMISSIONS.csv.gz file and seeing if it still throws you that same error. Maybe there is a new version of 7zip which requires me to update the script or something!