[General boards] [Fall 2018 courses] [Summer 2018 courses] [Winter 2018 courses] [Older or newer terms]

Model saver: permission denied


#1

On Coral, I got the following error:

================================================================================
TRAINING

Epoch 0
1892336/1892336 [==============================] - 223s 118us/step - Cross-entropy: 0.5582
Traceback (most recent call last):
File “model.py”, line 432, in
main(False, bonus)
File “model.py”, line 402, in main
saver.save(session, weight_file.name)
File “/local/packages/python-3.6/lib/python3.6/site-packages/tensorflow/python/training/saver.py”, line 1484, in save
save_relative_paths=self._save_relative_paths)
File “/local/packages/python-3.6/lib/python3.6/site-packages/tensorflow/python/training/saver.py”, line 888, in _update_checkpoint_state
text_format.MessageToString(ckpt))
File “/local/packages/python-3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py”, line 419, in atomic_write_string_to_file
rename(temp_pathname, filename, overwrite=True)
File “/local/packages/python-3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py”, line 401, in rename
compat.as_bytes(oldname), compat.as_bytes(newname), overwrite, status)
File “/local/packages/python-3.6/lib/python3.6/contextlib.py”, line 88, in exit
next(self.gen)
File “/local/packages/python-3.6/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py”, line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.PermissionDeniedError: /tmp/checkpoint.tmp6b5b75bfb5c44ecabd33db7038db33b8

Has anybody experienced this or know how to get around it?


#2

yes, I’ve gotten this too


#3

Any solution that worked for you?


#4

OK, what I’ve come up for this is:

If you’re using your own machine to write the code and then just testing on one of the coral machines to make sure the code runs in that environment, you can do this by running it in debug mode. That disables the checkpointing that’s causing the issue.

If you’re doing all your development for this assignment on the lab machines (including coral01-05), you can modify the location of the temporary file that’s used for the checkpoints. Change this line:

weight_file = NamedTemporaryFile(suffix='.weights')

to

weight_file = NamedTemporaryFile(suffix='.weights', dir=OTHERDIR)

where OTHERDIR is another directory where you have write permissions. The best thing is to use the systemd-provided temporary directory specific to your user, which is going to be '/run/user/ID', where ID is your unix ID, not the username. You can find out your unix ID by running the id command while logged in to any lab machine (including wolf and coral01-05).
The best thing would be to change this back before submitting, but you won’t lose marks or anything if you forget.


#5

This is what I came up. Thanks for confirming that it’s the right thing to do.


#6

how do you run it in debug mode? just debug == true?


#7

Yeah, change the main(false) call to main(True).