Sessions: Race condition in FilesystemStore

Created on 11 Jun 2013  ·  23Comments  ·  Source: gorilla/sessions

There is a race condition in FilesystemStore that I intend to fix but I would like your input before I go ahead and do it. Basically the problem is that if you have concurrent requests from the same user (same session) that the following is possible:

  1. Request 1 opens the session to perform a semi long operation
  2. Request 2 opens the session
  3. Request 2 Removes session data to perform "logout" or similar
  4. Request 2 saves
  5. Request 1 saves, which makes it as if the session was never logged out

I have added a test case for this flaw at cless/sessions@f84abeda17de0b4fcd72d277412f3d3192f206f2

The most straight forward way to fix this would be by introducing locks at the file system level. However, golang has no cross platform way to do file locking. It does expose flock in syscall but that only works if the OS supports it. I believe the behavior of flock might also be different on different unixes although I am not sure that this is the case. Another issue with flock is that it might not work on NFS.

An entirely different solution would be to keep a map of locks in the FilesystemStore object itself. This has another set of disadvantages: You can't have multiple processes access the same file system sessions and you can't create multiple stores for the same file system session within a single application. However, both these things are already impossible to do without causing issues.

In the end, I think the best solution is to keep a map of locks in the store object because all the disadvantages in that scenario can be properly documented and you can reply on the behavior being the same across different systems.

Other storage backends that are based on FilesystemStore might copy this flaw (I noticed this issue when reviewing Redistore code for a project of mine boj/redistore#2)

bug stale

All 23 comments

Instead of locking, what about a transaction-based system? The Session object could carry a lastModified field. When you attempt to save the session back in to the session store it would return an error indicating it has been modified since it was last read. The programmer could then choose to retry by getting the Session again.

That could work, it has the advantage that there are no locks that need to be cleaned up and there is no chance for deadlocks. I'm not sure it's always possible for developers to do a meaningful retry depending on what the request has already done. I think locking is definitely the safer option, but transactions could certainly work if the API clearly documents the chance for failure and if developers diligently handle those errors.

That said, even if FilesystemStore uses transactions I think the API should be prepared for storage backends that _do_ use locking, but that would mean either disallowing calling session.Save() more than once _or_ introducing a new session.Release(). Which would you prefer, or would you prefer that no storage backends use locking?

Following this issue since it impacts RediStore. Thanks for the information @cless

I don't really have any preference at the moment other than doing the right thing in the long run :) Of course I would also like to keep the existing API. Also adding locks all over the place introduces a lot of additional complexity and overhead, which would also be nice to avoid.

Do you have any examples of other session frameworks that solve this problem? I'd be curious to see what they do.

I know the default php session handler uses filesystem based locking (in the php 5.4 tarball I just checked this is in the file ext/session/mod_files.c. It uses flock which is provided by ext/standard/flock_compat.c).

I'm not sure if php is a good example given it's reputation, but I'm afraid it's the only one I'm aware of at the moment. I'll try to take a look around to see if I can find some other frameworks and look at how they solve the issue.

I'd be curious to see how Flask, Pyramid, or Django handle these things. I'll take a look at them if I have the time. Rails would also be interesting if anyone is familiar with that codebase.

Both Pyramid and Flask seem to be using a cookie to store the data, I suspect that neither handles race conditions. I have only briefly read the documentation so I could very well be wrong.

Django does support server side session data so I'll take a look at that code in a bit.

The default Django session backend is the database and this uses the Django database abstraction layer which I am unfamiliar with so it is hard for me to interpret. However, I found this comment in the filesystem backend:

        # Write the session file without interfering with other threads
        # or processes.  By writing to an atomically generated temporary
        # file and then using the atomic os.rename() to make the complete
        # file visible, we avoid having to lock the session file, while
        # still maintaining its integrity.
        #
        # Note: Locking the session file was explored, but rejected in part
        # because in order to be atomic and cross-platform, it required a
        # long-lived lock file for each session, doubling the number of
        # files in the session storage directory at any given time.  This
        # rename solution is cleaner and avoids any additional overhead
        # when reading the session data, which is the more common case
        # unless SESSION_SAVE_EVERY_REQUEST = True.
        #
        # See ticket #8616.

This seems to suggest they make sure the session file contents are never mangled but not that no race conditions can happen. If anyone has Django experience it would be nice to see a test case like cless/sessions@f84abed
Stackoverflow also seems to indicate Django might have race conditions in the sessions: http://stackoverflow.com/search?q=django+session+race+condition

Alright, well FileSystemStore already has a coarse-grained mutex to prevent the corruption of the session store. I'm not sure how much additional protection is worth putting in to the library because it would add a lot of overhead for every request, just to make things more consistent for some requests. Certainly the scenario here is feasible, but I think it's probably too complicated to deal with in a general sense. I can see there being many cases where what happens now is totally fine, whereas in some applications it could be a problem. I suspect most of the time you would only have one in-flight request for a given session anyway.

Really it's an issue that's present in any web resource, the same thing would happen if you had a GET followed by a PUT based on the information, things may have changed in the meantime.

Anyway, those are my thoughts at the moment but I'm glad to discuss further.

It seems that in the long run this is more of an application domain problem.

Given the scenario cless posted, if Request 1 is expected to run long enough that Request 2 could happen in parallel (say, a Single Page Application written in Angular.js, or an asynchronous Node.js app hitting some API), it seems the long running request should be decoupled from the session and considered an independent worker process at that point.

None of this directly addresses the issue of course, but as kisielk mentioned, doing any kind of locking adds a lot of overhead for what appears to be a marginal case.

here are some real world examples of the impact of session race conditions. The bugs are hard to debug, show up seemingly at random and are an annoyance for both developers and users. Given a sufficiently large application that makes extensive use of ajax these race conditions will show up sooner or later.
http://www.hiretheworld.com/blog/tech-blog/codeigniter-session-race-conditions
http://www.chipmunkninja.com/Troubles-with-Asynchronous-Ajax-Requests-g@

I've read the whole thread at EllisLab/CodeIgniter#1746 which deals with a similar issue, but their issue is complicated by the fact that CodeIgniter forcfully regenerates the session id every once in a while and most of the discussion there revolves around this fact.

I understand your reluctance to change sessions in an incompatible way or to introduce subtle differences in the API that might break existing applications. However, I still think there should be some optional locking involved. What about adding two functions to the store API like this:

type Store interface {
    Get(r *http.Request, name string) (*Session, error)
    New(r *http.Request, name string) (*Session, error)
    Save(r *http.Request, w http.ResponseWriter, s *Session) error
    Lock(r *http.Request, name string) error
    Release(r *http.Request, name string) error
}

Use of these functions would be entirely optional (although I would personally encourage locking in every request that is going to write to the session) and have no impact on existing applications.

There is an issue with the proposed locking interface: if the lock is held in a database such as MySQL and your database connection drops after you acquire the lock then you can never release it and your session is effectively deadlocked. Locks should expire in some way, but I am not entirely sure how that should be exposed to the developer.

Sorry I missed Boj's reply when I posted mine:
It's important to keep in mind that long running requests are not a requirement to trigger a race condition. The 500 ms sleep in my test case is there only to ensure that the race condition is triggered. In the real world these race conditions will occur for "short" requests too, just less frequently, and that's exactly what makes them hard to debug.

You also have to keep in mind that under high load even short requests might take some time to execute.

@cless Good points.

I like your proposed interface changes.

Your lock expiration would be relatively easy to implement for RediStore, Redis has an EXPIRE command which allows a TTL to be set for a key.

Wouldn't this kind of fine-grained locking still be fairly inefficient for many types of session stores? And the expiration is an issue also.

It seems part of the problem is that Save can succeed even if the session longer exists, would it not go a long way towards solving the problem if that were not the case?

@kisielk, can you give examples of stores that would be inefficient? Most key-value databases have some form of atomic operations that allow you to create locks. SQL databases usually have access to row based locking (though I must say I'm not familiar with the details here).
The locking method of key-value stores like redis requires multiple round trips to the database though, perhaps that is what you meant?
There is an issue with file system stores because go does not provide us with a cross platform way to access file locks. I suppose cgo would allow you to create one that supports the main platforms, but that's probably a solution I would avoid.

I don't think saving when the session no longer exists is a real issue unless you're talking about removing the current session and replacing it with a new session to prevent session fixation, but to be honest that is an entirely different problem.

@boj, the issue is that the programmer needs some way to verify that a lock has not yet expired before he tries to save. What should the expiration time on a lock be?

This is one possible way to go about it, but I'm not sure I like it a whole lot. You're depending on the clock not jumping forwards or backards suddenly and that's not a safe assumption to make:

func Lock(expiration time.Duration) time.Time {
    // Block here until the lock becomes your. Set the lock to expire after
    // dur+50ms. This extra 50 ms ensures that you never assume you own an
    // expired lock
    return time.Now().Add(expiration)
}

func main() {
    end := Lock(1000 * time.Millisecond)

    // Do your thing here ...
    // time.Sleep(1100 * time.Millisecond)

    if time.Now().After(end) {
        fmt.Println("Lock expired, throw an error")
    } else {
        fmt.Println("Good to go, save the session")
    }
}

@cless Your whole comment kind of sums up what seems to be @kisielk's worries. X does this, Y does that, Z might not be possible unless some extremes are involved. While race conditions are not desirable given the rare cases they may happen, at least the current gorilla/sessions design is very simple and follows the element of least surprise philosophy.

It could very well be simple to implement the interface part of gorilla/sessions as you proposed, however, you create a scenario where you may or may not be able to easily swap out backends unless they happen to implement Lock/Release in an equally easy and efficient manner. There seems to be too many what ifs in regards to what you propose, the foremost being "can the backend even implement this without resorting to hackish programming?", followed by "I tried to swap from FileSystemStore to FooBarStore, but it doesn't appear to implement Lock/Release and my program doesn't behave as expected."

There's a good article about the nuances of per-session locking here:

http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/

As we've already concluded here the timeout for locks if something goes wrong is one of the major issues implementing this kind of scheme. This needs more thought and is very backend dependant

As far as implementing Lock/Release we could make the per-session lock implementation optional by having a separate interface. The backend could then be asserted against that interface and if it does not implement it we could provide a default implementation using in-memory locks.

Another issue of course is that users of the library will need to update their code to use Lock and Release, but I guess without that they would have the same behaviour as they do now.

"we could provide a default implementation using in-memory locks"

This would be meaningless in a multi-server environment where requests are load balanced.

I like the idea to provide an entirely different interface for locking because it allows you more freedom to implement it correctly without altering the behavior of existing code that uses sessions.

As @boj noted you can't really provide memory locks on a database store in any meaningful way. The burden of providing the lock functionality should probably be on the store developer. If the locks interface is missing you can still return errors when the session tries to use them. That really shouldn't be a problem, if the default stores provide locking then other developers will follow suit and users will be minimally, if at all, inconvenienced.

Memory locks can still make sense on some stores, and I think the FilesystemStore is a good candidate for them. Realistically you won't use the filesystem store in a load balanced situation (although theoretically possible with network filesystems). Filesystem locks seem preferable, but they appear to be hard to implement cross platform in go, and on top of that they aren't guaranteed to work with NFS either.

@boj: agreed, but as @cless points out that's based on the kind of store you have. FilesystemStore already relies on an RWMutex to prevent concurrent modifications of the session store. I expect most of the backends would actually implement the locking interface at least eventually, but having a default would allow a developer to release on that's suitable for a single-server environment in the meantime. FilesystemStore would be one of those until the cross-platform locking situation in Go libraries gets better.

I agree totally with @cless and want to stress the importance of session locking and mature platforms (such as PHP) implement it on their session stores. On files it using the 'flock' syscall and in memcache using the return value of the 'add' operation and an extra key with a '.lock' suffix and an expire set. I also agree with the approach of @kisielk to have a new interface (having the functions 'Lock' and 'Release' makes sense. Can we have this please? Go should be the language to go to for high performance and reliability. I am willing to contribute some implementations (FileSystem, Redis and Memcache) for instance. Note that we do need a few new configuration options:

  • spinLockWait: milliseconds to sleep between attempts to acquire the lock (default: 150)
  • lockMaxWait: seconds to wait for a lock (default: 30)
  • lockMaxAge: seconds a lock may be kept before it is released automatically (default: lockMaxWait)

This issue has been automatically marked as stale because it hasn't seen a recent update. It'll be automatically closed in a few days.

Was this page helpful?
0 / 5 - 0 ratings