Django-rest-framework: PUT calls don't fully "replace the state of the target resource"

Created on 30 Jun 2016  ·  68Comments  ·  Source: encode/django-rest-framework

EDIT: For the current status of the issue, skip to https://github.com/encode/django-rest-framework/issues/4231#issuecomment-332935943

===

I'm having an issue implementing an Optimistic Concurrency library on an application that uses DRF to interact with the database. I'm trying to:

  • Confirm that the behavior I'm seeing is attributable to DRF
  • Confirm that this is the intended behavior
  • Determine if there's any practical way to overcome this behavior

I recently added optimistic concurrency to my Django application. To save you the Wiki lookup:

  • Every model has a version field
  • When an editor edits an object, they get the version of the object they're editing
  • When the editor saves the object, the included version number is compared to the database
  • If the versions match, the editor updated the latest document and the save goes through
  • If the versions don't match, we assume a "conflicting" edit was submitted between the time the editor loaded and saved so we reject the edit
  • If the version is missing, we cannot do testing and should reject the edit

I had a legacy UI talking through DRF. The legacy UI did not handle version numbers. I expected this to cause concurrency errors, but it did not. If I understand the discussion in #3648 correctly:

  • DRF merges the PUT with the existing record. This causes a missing version ID to be filled with the current database ID
  • Since this always provides a match, omitting this variable will always break an optimistic concurrency system that communicates across DRF
  • ~There are no easy options (like making the field "required") to ensure the data is submitted every time.~ (edit: you can workaround the issue by making it required as demonstrated in this comment)

Steps to reproduce

  1. Setup an Optimistic Concurrency field on a model
  2. Create a new instance and update several times (to ensure you no longer have a default version number)
  3. Submit an update (PUT) through DRF excluding the version ID

    Expected behavior

The missing version ID should not match the database and cause a concurrency issue.

Actual behavior

The missing version ID is filled by DRF with the current ID so the concurrency check passes.

Enhancement

All 68 comments

Okay, can't promise I'll be able to review this pretty in-depth ticket immediately, as the upcoming 3.4 release takes priority. But thanks for such a detailed, well thought through issue. This'll most likely like be looked at on the scale of weeks, not days or months. If you make any progress, have any further thoughts yourself, please do update the ticket and keep us informed.

OK. I'm pretty sure my issue is the combination of two factors:

  1. DRF doesn't require the field in the PUT (even though it is required in the model) because it has a default (version=0)
  2. DRF merges the PUT fields with the current object (without injecting the default)

As a result, DRF uses the current (database) value and breaks the concurrency control. The second half of the issue is related to the discussion in #3648 (also cited above) and there's a (pre 3.x) discussion in #1445 that still appears to be relevant.

I'm hoping a concrete (and increasingly common) case where the default behavior is perverse will be enough to reopen the discussion about the "ideal" behavior of a ModelSerializer. Obviously, I'm only an inch deep on DRF, but my intuition is that the following behavior is appropriate for a required field and a PUT:

  • When using a non-partial serializer, we should either receive the value, use the default, or (if no default is available) raise a validation error. Model-wide validation should apply to only the inputs/defaults.
  • When using a partial serializer, we should either receive the value or fallback on the current values. Model-wide validation should apply to that combined data.
  • I believe the current "non-partial" serializer is really quasi-partial:

    • It's non-partial for fields that are required and have no default

    • It's partial for fields that are required and have a default (since the default isn't used)

    • It's partial for fields that are not required

We can't change bullet (1) above or the defaults become useless (we require the input even though we know the default). That means we have to fix the issue by changing #2 above. I agree with your argument in #2683 that:

Model defaults are model defaults. The serializer should omit the value and defer responsibility to the Model.object.create() to handle this.

To be consistent with that separation of concerns, update should create a new instance (delegating all defaults to the model) and apply the submitted values to that new instance. This results in the behavior requested in #3648.

Trying to describe the migration path helps highlight how odd the current behavior is. The end goal is to

  1. Fix the ModelSerializer,
  2. Add a flag for this quasi-partial state, and
  3. Make that flag the default (for backwards compatibility)

What is the name of that flag? The current Model Serializer is actually a partial serializer that (somewhat arbitrarily) requires fields that meet the condition required==True and default==None. We can't explicitly use the partial flag without breaking backwards compatibility so we need a new (hopefully temporary) flag. I'm left with quasi_partial, but my inability to express the arbitrary requirement required==True and default==None is why it's so clear to me that this behavior should be deprecated urgently.

You can add extra_kwargs in serializer's Meta, making version a required field.

class ConcurrentModelSerializer(serializers.ModelSerializer):
    class Meta:
        model = ConcurrentModel
        extra_kwargs = {'version': {'required': True}}

Thanks @anoopmalev. That'll keep me on the production branch.

After "sleeping on it" I realize there's an extra wrinkle. Everything I said should apply to the serializer's fields. If a field isn't included in the serializer, it should not be modified. In this way, all serilaizers are (and should be) partial for the non-included fields. This is a little more complicated than my "make a new instance" above.

I believe this issue needs to be reduced to a more constrained proposal in order to move forward.
Seems to broad to be actionable in it's current state.
For now I'm closing this - if anyone can reduce it to a concise, actionable statement of desired behavior then we can reconsider. Until then I think it's simply to broad.

Here's a concise proposal... for a non-partial serializer:

  1. For any field not listed in the serializer (implicitly or explicitly) or marked as read-only, preserve the existing value
  2. For all other fields, use the first option available:

    1. Populate with the submitted value

    2. Populate with a default, including a value implied by blank and/or null

    3. Raise an exception

For clarity, validation is run on the final product of this process.

Ie you want to set required=True on any serializer field that doesn't have a model default, for updates?

Have I got that correct?

Yes (and more). That's how I understand the partial (all fields optional) vs. non-partial (all fields required) distinction. The only time a non-partial serializer doesn't require a field is the presence of a default (narrowly or broadly defined) _since the serializer can use that default if no value is provided._

The italicized section is what DRF is not currently doing and the more important change in my proposal. The current implementation just skips the field.

I had a second proposal mixed in, but it's really a separate question of how generous you want to be with the idea of a "default". The current behavior is "strict" in that only a default is treated as such. If you _really_ wanted to reduce the amount of required data, you could make blank=True fields optional as well... assuming that an absent value is a blank value.

@claytondaley I'm using OOL with DRF since 2x this way:

class VersionModelSerializer(serializers.ModelSerializer, BaseSerializer):
    _initial_version = 0

    _version = VersionField()

    def __init__(self, *args, **kwargs):
        super(VersionModelSerializer, self).__init__(*args, **kwargs)

        # version field should not be required if there is no object
        if self.instance is None and '_version' in self.fields and\
                getattr(self, 'parent', None) is None:
            self.fields['_version'].read_only = True
            self.fields['_version'].required = False

        # version field is required while updating instance
        if self.instance is not None and '_version' in self.fields:
            self.fields['_version'].required = True

        if self.instance is not None and hasattr(self.instance, '_version'):
            self._initial_version = self.instance._version

    def validate__version(self, value):
        if self.instance is not None:
            if not value and not isinstance(value, int):
                raise serializers.ValidationError(_(u"This field is required"))

        return value
   # more code & helpers

it works just great with all kind of business logic and never caused any problem.

Was this left closed on accident? I responded to the specific question and didn't hear a reason what was wrong with the proposal.

@claytondaley why OOL should be a part of DRF? Check my code – it works just find in a large app (1400 tests). VersionField is just an IntegerField.

You've hard-coded the OOL into the serializer. This is the wrong place to do it because you have a race condition. Parallel updates (with the same prior version) would all pass at the serializer... but only one would win at the save action.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs). Basically UPDATE... WHERE version = submitted_version. This is atomic so there's no race condition. However, it exposes a flaw in the serialization logic::

  • If default is set on a field in the model, DRF sets required=False. The (valid) idea is that DRF can use that default if no value is submitted.
  • If that field is missing, however, DRF doesn't use the default. Instead it merges the submitted data with the current version of the object.

When we don't require the field, we do so because we have a default to use. DRF doesn't fulfill that contract because it doesn't use the default... it uses the existing value.

The underlying issue was discussed before, but they didn't have a nice, concrete case. OOL is that ideal case. The existing value of a version field always passes OOL so you can bypass the entire OOL system by leaving out version. That's (obviously) not the desired behavior of an OOL system.

@claytondaley

You've hard-coded the OOL into the serializer.

Did I? Have you found any OOL logic in my serializer beside field requirement?

This is the wrong place to do it because you have a race condition.

Sry, I just cant see where is the race condition here.

I'm using django-concurrency which puts the OOL logic into the save action (where it belongs).

I'm also using django-concurrency :) But thats model level, not serializer. On the serializer level you just need to:

  • make sure _version field is always required (when it should be)
  • make sure your serializer knows how to handle OOL errors (this part I've ommited)
  • make sure your apiview knows how to handle OOL errors and raises HTTP 409 with possible diff context

actually, im not using django-concurrency due to an issue that autor marked as "wont fix": it bypasses OOL when obj.save(update_fields=['one', 'two', 'tree']) is used which I found bad practice, so I forked package.

here is the missing save method of the serializer I've mentioned earlier. that should solve all of your issues:

    def save(self, **kwargs):
        try:
            self.instance = super(VersionModelSerializer, self).save(**kwargs)
            return self.instance
        except VersionException:
            # Use select_for_update so we have some level of guarantee
            # that object won't be modified at least here at the same time
            # (but it may be modified somewhere else, where select_for_update
            # is not used!)
            with transaction.atomic():
                db_instance = self.instance.__class__.objects.\
                    select_for_update().get(pk=self.instance.pk)
                diff = self._get_serializer_diff(db_instance)

                # re-raise exception, so api client will receive friendly
                # printed diff with writable fields of current serializer
                if diff:
                    raise VersionException(diff)

                # otherwise re-try saving using db_instance
                self.instance = db_instance
                if self.is_valid():
                    return super(VersionModelSerializer, self).save(**kwargs)
                else:
                    # there are errors that could not be displayed to a user
                    # so api client should refresh & retry by itself
                    raise VersionException

        # instance.save() was interrupted by application error
        except ApplicationException as logic_exc:
            if self._initial_version != self.instance._version:
                raise VersionException

            raise logic_exc

Sorry. I didn't read your code to figure out what you were doing. I saw a serializer. You can obviously work around the issue by hacking the serializer but you shouldn't have to.... because the flaw in the DRF logic stands on its own. I'm just using OOL to make the point.

And you should try that code against the latest version of django-concurrency (using IGNORE_DEFAULT=False). django-concurrency was also ignoring default values, but I submitted a patch. There was an odd corner case that I had to hunt down to make it work for normal cases.

I think it's called extending default functionality, not really hacking. I think the best place for such feature support is at django-concurrency package.

I've reread whole issue discussion and found your proposal too broad and it would fail in many places (due to magically using default values from different sources under different conditions). DRF 3.x just got much easier and predictable than 2.x, lets keep it that way :)

You can't fix this in the model layer because it's broken at the serializer (before it gets to the model). Set OOL aside... why don't we require a field if default is set?

A non partial serializer "requires" all fields (fundamentally) and yet we let this one by. Is it a bug? Or do we have a logical reason?

as you can see in my code example – _version field is always correctly required in all possible cases.

btw, it turned out that I've borrowed model lvl code from https://github.com/gavinwahl/django-optimistic-lock and not from django-concurrency which is way to complex for almost no reason.

... so the bug is "non-partial serializers incorrectly set some fields to not-required". That's the alternative. Because that's the (implicit) commitment that a non-partial serializer makes.

I can quote it:

By default, serializers must be passed values for all required fields or they will raise validation errors.

This says nothing about required (except when a default is provided).

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

I think I've lost your point..

(and I get that I'm talking about two different levels, but the ModelSerializer shouldn't un-require fields if it's not going to take responsibility for that decision)

Whats wrong with that?

OK let me try a different angle.

  • Assume I have a non-partial Model Serializer (edit: all defaults) that covers all the fields in my model.

Should a CREATE or UPDATE with the same data ever produce a different object (minus the ID)

Can you describe your ideas using some really simple model & serializer and few lines that shows failed / expected behavior?

I'll put something together tomorrow as it's getting late here... but the deeper I get, the more sense #3648 makes for a non-partial serializer. In the meantime, why doesn't a ModelSerializer require all fields in the model? Maybe your rationale is different than mine.

ModelSerializer inspects bounded model and decides whether it should be required isn't it?

I don't mean mechanically how. The base assumption for a non-partial Serializer is to require everything (quoted above). If get_field_kwargs is going to deviate from this assumption (specifically, here), it should have a good reason. What is that reason?

My preferred answer is the one I keep giving, "because it can use that default if no value is submitted" (but then DRF has to actually use the default). Is there another answer I'm missing? A reason why fields with defaults should not be required?

Obviously, I prefer the "complete" solution. However, I'll concede that there's a second answer. We could require these fields by default. That eliminates the (currently arbitrary) special case. It simplifies/reduces the code. It's internally consistent. It's addresses my concern.

Basically, it makes the non-partial serializer truly non-partial.

Now I at least know what you mean. have you checked what is ModelForm behaviour in such case? (Cannot do this myself on mobile)

Django docs says that 'blank' controls whether field is required or not. I suggest you should open a separate ticket for this issue since this one contains a lot of unrelated comments. In my opinion modelserializer might work like modelform: blank option controls required, 'null' tells if None is an acceptable input and 'default' has no effect on that logic.

I'm willing to open a second ticket, but I'm worried that blank requires similar code. From the django discussion group:

if we take an existing model form and template that works, add an optional character field to the model but fail to add a corresponding field to the HTML template (e.g. human error, forgot about a template, didn't tell the template author to make a change, didn't realise a change needed to be made to a template), when that form is submitted Django will assume that the user has provided an empty string value for the missing field and save that to the model, erasing any existing value.

To be consistent, we would have an obligation to fulfill the second half of the contract, setting the absent value to blank. This is slightly less problematic because a blank can be filled without reference to a model, but is very similar (and, I think, consistent with #3648).

@tomchristie can you give some short input on this: Why required state depends on model field defaults property?

Why required state depends on model field defaults property?

Simply this: If a model field has a default, then you can omit providing it as input.

Actually I agree with this behavior. ModelForm despite the code is doing the same (generated html will provide defaults). If DRF would different logic then 'default' will never apply. Im done with this issue.

@pySilver actually, here's the ModelForm behavior:

# models.py

from django.db import models

class MyModel(models.Model):
    no_default = models.CharField(max_length=100)
    has_default = models.CharField(max_length=100, default="iAmTheDefault")

For clarity, stuff is still named "partial" because the _update_ is partial. I was also testing a complete ("full") update, but the code was unnecessary to show the behavior:

# in manage.py shell
>>> from django import forms
>>> from django.conf import settings
>>> from form_serializer.models import MyModel
>>>
>>> class MyModelForm(forms.ModelForm):
...     class Meta:
...         model = MyModel
...         fields = ['no_default', 'has_default']
...
>>>
>>> partial = MyModel.objects.create()
>>> partial.id = 2
>>> partial.no_default = "Must replace me"
>>> partial.has_default = "I should be replaced"
>>> partial.save()
>>>
>>>
>>> POST_PARTIAL = {
...     "id": 2,
...     "no_default": "must change me",
... }
>>>
>>>
>>> form_partial = MyModelForm(POST_PARTIAL)
>>> form_partial.is_valid()
False
>>> form_partial._errors
{'has_default': [u'This field is required.']}

ModelForm requires this input even thought it has a default. This is one of the two internally-consistent behaviors.

Why required state depends on model field defaults property?

Simply this: If a model field has a default, then you can omit providing it as input.

@tomchristie agree in principle. But what is the expected behavior?

  • On create, I get the default (trivial, everyone agrees this is right)
  • On update, what should I get?

It seems to me that I should get the default on update as well. I don't see why a non-partial serializer should behave any differently in the two cases. Non-partial means I'm sending the "complete" record. Thus the complete record should be replaced.

I'd expect the value to be unchanged if it's not provided on update. I see the point, but transparently overwriting with the default value would be counter-intuitive from my POV.

(If anything I think it'd probably actually be better for all updates to be partial semantics for all fields - PUT would still be idempotent, which is the important aspect, tho possibly awkward to change given current behavior)

I certainly don't share you preferences; I want all my interfaces to be strict unless I deliberately make them otherwise. However, your PARTIAL vs. NON-PARTIAL distinction already provides (in theory) what we both want.

I believe partial behaves exactly as you want:

  • UPDATEs are 100% partial
  • CREATEs (I assume) are partial with respect to default and blank (logical exceptions). In all other cases the model/database constraints bind.

I'm just trying to get consistency in the non-partial serializer. If you eliminate the special-case for default, your existing non-partial serializers become the strict serializer I want. They also reach parity with ModelForm.

I realize this creates a small discontinuity within the project, but this isn't the first time someone's made a change like this. Add a "legacy" flag defaulting to the current behavior, add a warning (that the default behavior will change), and change the default in a subsequent major release.

More importantly, if you want your serializers to be the new de facto for Django, you're going to end up making this change anyway. The number of people converting from ModelForm will vastly exceed the existing user base and they'll expect at least this change.

Inserting my two cents:
I'm inclined to agree with @claytondaley. PUT is an idempotent resource replacement, PATCH is an update to the existing resource. Take the following example:

class Profiles(models.Model):
    username = models.CharField()
    role = models.CharField(default='member', choices=(
        ('member', 'Member'), 
        ('moderator', 'Moderator'),
        ('admin', 'Admin'), 
    ))

New profiles sensibly have the default member role. Let's take the following requests:

POST /profiles username=moe
PUT /profiles/1 username=curly
PATCH /profiles/1 username=larry&role=admin
PUT /profiles/1 username=curly

As it currently stands, after the first PUT, the profile data would contain {'username': 'curly', 'role': 'member'}. After the second PUT, you would have {'username': 'curly', 'role': 'admin'}. Does this not break idempotency? (I'm not entirely sure - am legitimately asking)

Edit:
I think everyone is on the same page about PATCH's semantics.

After the second PUT, you would have {'username': 'curly', 'role': 'admin'}

Me, personally would be surprised if role would switch back to default (tho I see the reason of this replace object discussion, i've never had any real world issues with it yet)

i've never had any real world issues with it yet

Same here, but so far our projects have relied on PATCH :)
That said, the OP's use case w/ model versioning to handle concurrency does makes sense to me. I would expect PUT to use the default value (if the value is omitted), raising the concurrency exception.

Let me start by acknowledging that a serializer need not necessarily follow the RESTful RFC. However, they should at least offer modes that _are_ compatible -- especially in a package that's offering REST support.

My original argument was from first principles, but the RFC (section 4.3.4) specifically says (emphasis added):

The fundamental difference between the POST and PUT methods is highlighted by the different intent for the enclosed representation. The target resource in a POST request is intended to handle the enclosed representation according to the resource's own semantics, whereas the enclosed representation in a PUT request is defined as replacing the state of the target resource. Hence, the intent of PUT is idempotent and visible to intermediaries, even though the exact effect is only known by the origin server.
...
An origin server that allows PUT on a given target resource MUST send a 400 (Bad Request) response to a PUT request that contains a Content-Range header field (Section 4.2 of [RFC7233]), since the payload is likely to be partial content that has been mistakenly PUT as a full representation. Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, or by using a different method that has been specifically defined for partial updates (for example, the PATCH method defined in [RFC5789])

So a PUT should never be partial (see also, here). However, the section on PUT also clarifies:

The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.

The point about the GET (while not obligatory) argues for my "compromise" solution. While injecting blanks/defaults is convenient, it would not provide this behavior. The nail in the coffin is probably that this solution minimizes confusion since there won't be any missing fields to raise doubt.

Obviously, PATCH is a specified option for partial updates, but it's described as a "set of instructions" rather than just a partial PUT so it always makes me a little antsy. The section on POST (4.3.3) actually states:

The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. For example, POST is used for the following functions (among others):

  • Providing a block of data, such as the fields entered into an HTML form, to a data-handling process;

...

  • Appending data to a resource's existing representation(s).

I think there's an argument for using POST for partial updates since:

  • conceptually, amending data isn't dissimilar from appending
  • POST is permitted to use its own rules so those rules could be a partial update
  • this operation can easily be distinguished from a CREATE by the presence of an ID

Even if DRF doesn't aspire to full compliance, we need a serializer that's compatible with the spec PUT operation (i.e. replacing the entire object). The simplest (and clearly least confusing) answer is to require all fields. It also suggests that PUT should be non-partial by default and that partial updates should use a different keyword (PATCH or even POST).

I think I've just hit my first PUT issue while migrating our app to drf3.4.x :)

@cached_property
    def _writable_fields(self):
        return [
            field for field in self.fields.values()
            if (not field.read_only) or (field.default is not empty)
        ]

This makes my .validated_data contain data that I did not provide in PUT request and I did not provide manually within serializer. Values were retrieved from default= on serializer level. So basically while intended to update particular field, I also overwrite some of that fields with default values out of the blue.

Happy for me, im using custom ModelSerializer, so I can fix the issue easily.

@pySilver I don't understand the content of the latest comment.

@rpkilby "Let's take the following requests... Does this not break idempotency"

Nope, each PUT request is idempotent in that it can be repeated multiple times resulting in the same state. That doesn't mean that if some other part of state has been modified in the meantime that it'll somehow be reset.

Here's some different options for PUT behavior.

  • Fields are required unless required=False or they have a default. (Existing)
  • All fields are required. (Stricter, more closely aligned semantics of complete update _but_ awkward because it's actually stricter than the initial creation semantics for POST)
  • No fields are required (Ie. Just mirror PATCH behavior)

Clear that there's no _absolute_ correct answer, but I believe we've got the most practical behavior as it currently stands.

I believe some use cases might find it problematic if there's a field that doesn't need to be provided for POST requests, but subsequently does for PUT requests. Moreover, PUT-as-create is itself a valid operation, so again, it'd be odd if that had different "requiredness" semantics to POST.

If someone wants to take this forward I'd _strongly_ suggest starting as a third party package, that implements differing base serializer classes. We can then link to that from the serializers documentation. If the case is well made, then we can consider adapting the default behavior at some point in the future.

@tomchristie what I ment to say:

I have a serializer with readonly language field and a model:

class Book(models.Model):
      title = models.CharField(max_length=100)
      language = models.ChoiceField(default='en', choices=(('pl', 'Polish'), ('en', 'English'))

class BookUpdateSerialzier(serializers.ModelSerializer):
      # language is readonly, I dont want to let users update that field using this serializer
      language = serializers.ChoiceField(default='en', choices=(('pl', 'Polish'), ('en', 'English'), read_only=True)
      class Meta:
          model = MyModel
          fields = ('title', 'language', )

book = Book(title="To be or 42", language="pl")
book.save()

s = BookUpdateSerialzier(book, data={'title': 'Foobar'}, partial=True)
s.is_valid()
assert 'language' in s.validated_data # !!! 
assert 'pl' == s.validated_data # AssertionError... here :(
  • I did not pass language in request and I don't expect to see this in validated data. Being passed to update it would overwrite my instance with defaults despite the fact that object has already some non-default value assigned.
  • It would be less problematic if validated_data['language'] would be book.language in that case.

@pySilver - Yup, that's been resolved in https://github.com/tomchristie/django-rest-framework/pull/4346 just today.

As it happens you don't need default= on the serializer field in the example you have, as you have a default on the ModelField.

@tomchristie Do you at least agree that the current PUT behavior is not RFC spec? And that both of my suggestions (require all or inject defaults) would make it so?

@tomchristie great news!

As it happens you don't need default= on the serializer field in the example you have, as you have a default on the ModelField.

Yeah, I just wanted to make it super explicit for demo.

It's finally sinking in that @tomchristie is not arguing for/against serializer behavior in isolation. I believe his objections stem (implicitly) from the requirement that a single serializer support all REST modes. This exhibits itself in his complaints about how a strict serializer will affect a POST. Since the REST modes are incompatible, the current solution is a serializer that's not spec for any single mode.

If that's the real root of the objection, let's take it head-on. How can a single serializer provide spec behavior for all REST modes? My off-the-cuff answer is that PARTIAL vs. NON-PARTIAL is implemented at the wrong level:

  • We have partial and non-partial serializers. This approach means we need multiple serializers to support the spec behavior for all the modes.
  • We actually need partial vs. non-partial validation (or something in this vein). The different REST modes need to request different validation modes from the serializer.

To provide separation of concerns, a serializer shouldn't know the REST mode so it can't be implemented as a 3rd party serializer (nor, I suspect, does the serializer even have access to the mode). Instead, DRF ought to pass an extra piece of information to the serializer (roughly replace=True for PUT). The serializer can decide how to implement this (require all of the fields or inject the defaults).

Obviously, this is just a rough proposal, but maybe it will break the deadlock.

Moreover, PUT-as-create is itself a valid operation, so again, it'd be odd if that had different "requiredness" semantics to POST.

I agree that you can create with a PUT, but I disagree that the semantics are the same. PUT works on a specific resource:

The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload.

I believe, therefore, that the create semantics actually differ:

  • POST to /citizen/ expects a SSN (social security number) to be generated
  • PUT to /citizen/<SSN> updates the data for a specific SSN. If there is no data at that SSN, it results in a create.

Because the "id" must be included in the URI of PUT, you can treat it as required. By contrast, the "id" is optional in a POST.

Because the "id" must be included in the URI of PUT, you can treat it as required. By contrast, the "id" is optional in a POST.

Indeed. I was referring specifically to the fact that the proposed change of "make PUT strictly require _all_ fields" would mean that PUT-as-create would have different behavior to POST-as-create wrt. if fields are required or not.

Having said that I'm coming around to the value in having an option of PUT-is-strict behavior.

(Enforce that _all_ fields are strictly required in this case, enforce that _no_ fields are required in PATCH, and use the required= flag for POST)

How can a single serializer provide spec behavior for all REST modes?

We can differentiate between create, update and partial update given how the serializer is instantiated, so I don't think that's a problem.

You've already made the point that you can create using a PUT or POST. They have different semantics and different requirements so create needs to be agnostic to the REST mode. I think the distinction really happens as part of is_valid. We ask for a specific validation mode:

  • no field-presence validation (PATCH)
  • validation based on required flags (POST)
  • strict field-presence validation (PUT)

By keeping keyword-specific logic out of the CRUD operations, we also reduce the coupling between the serializer and DRF. If the validation modes were configurable, they'd be completely general-purpose (even if we only implement 3 specific cases for our 3 keywords).

Your doing a good job of arguing me out this functionality, there. :)

Differing "validation modes" when calling .is_valid() is an upheaval that's not going to fly.

We _could_ consider a 'complete=True' counterpart to the existing 'partial=True' unit kwarg perhaps. That'd fit in easily enough with how things currently work and would still support the "strict fields" case.

Is the serializer the right place to solve this problem? This requirement is tightly coupled to the REST keywords so maybe that's the right place to enforce it. To support this approach, the serializer need only expose a list of fields that it accepts as inputs,

More of an aside... is there a good discussion of Django's separation (allocation) of concerns somewhere? I'm having trouble limiting myself to Django-friendly answers because I don't know the answer to questions like "why is validation part of serialization". The serialization docs for 1.9 don't even mention validation. And, strictly from first principle, it seems like:

  1. The model should be responsible for validating internal consistency and
  2. The "view" (in this case, the REST mode processor) should be responsible for enforcing business rules (like the RFC) related to that view.

If the responsibility for validation goes away, serializers can be 100% partial (by default) and specialized for I/O rules like "read only". A ModelSerializer built this way would support a wide variety of views.

Is the serializer the right place to solve this problem?

Yes.

The serialization docs for 1.9 don't even mention validation.

Django's built-in serialization isn't useful for Web APIS, it's really limited to dumping and loading fixtures.

You know the architectural assumptions of both Django and DRF better than I so I must defer to you on the how. Certainly an init kwarg has the right feel to it... reconfiguring the serializer "on-demand". The only limitation is that they can't be reconfigured "on the fly", but I assume the instances are single-use so this isn't a significant issue.

I'm going to de-milestone this for now. We can reassess after v3.7

Up to you guys, but I want to make sure you're clear that this is not a Ticket to add concurrency support. The real issue is that a single serializer cannot correctly validate both a PUT and POST in the current architecture. Concurrency just provided the "failing test".

TL;DR You can see why this issue is blocked by starting at Tom's proposed fix.

In summary, the proposed solution is to make all fields required for a PUT request. There are (at least) two problems with this approach:

  1. Serializers think in actions not HTTP methods so there isn't a one-to-one mapping. The obvious example is create because it's shared by PUT and POST. Note that create-by-PUT is disabled by default so the proposed fix is probably better than nothing.
  2. We don't need to require all fields in a PUT (a sentiment shared by #3648, #4703). If a nillable field is absent, we know it can be None. If a field with a default is absent, we know we can use the default. PUTs actually have the same (Model-derived) field requirements as POST.

The real issue is how we handle missing data and the basic proposal in #3648, #4703, and here remain the right solution. We can support all of the HTTP modes (including create-by-PUT) if we introduce a concept like if_missing_use_default. My original proposal presented it as a replacement for partial, but it's easier (and may be necessary) to think of it as an orthogonal concept.

if we introduce a concept like if_missing_use_default.

There's nothing preventing anyone from implementing either this, or a strict "require all fields" as a base serializer class, and wrapping that up as a third party library.

My opinion is that a strict "require all fields" mode might also be able to make it into core, it's very clear obvious behavior, and I can see why that'd be useful.

I'm not convinced that a "allow fields to be optional, but replace everything, using model defaults if they exist" - That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves). If we want a stricter behavior, we should just have a stricter behavior.

Either way around, the right way to approach this is to validate it as a third party package, then update our docs so we can link to that.

Alternatively, if you're convinced that we're missing a behavior from core that our users really do need, then you're welcome to make a pull request, updating the behavior and the documentation, so we can assess the merits in a very concrete way.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

coming around to the value in having an option of PUT-is-strict behavior.

This still stands. I think we could consider that aspect in core, if someone cares enough about it to make a pull request along those lines. It'd need to be an optional behavior.

That seems like it'd present some very counter-intuitive behavior (eg. "created_at" fields, that automatically end up updating themselves).

A created_at field should be read_only (or excluded from the serializer). In both of these cases, it would be unchanged (the normal serializer behavior). In the counter-intuitive case that the field is not read-only in the serializer, you would get the counter-intuitive behavior of automatically changing it.

Happy to take pull requests as a starting point for this, and even happier to include a third party package demonstrating this behavior.

Absolutely. The "use defaults" variation is an ideal case for a 3rd party package because the change is a trivial wrapper around (one method of) the existing behavior and (if you buy into the defaults argument) works for all non-partial serializers.

tomchristie closed this 4 hours ago

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin" and leaving valid/acknowledged issues like this open. I often search open issues to see if a problem has already been reported and its progress towards resolution. I perceive closed issues as "invalid" or "fixed". Mixing a few "valid but closed" issues into the thousands of invalid/fixed issues doesn't invite efficient searching (even if you knew they might be there).

Perhaps you'd consider adding a label like "PR Welcome" or "3rd Party Plugin"

That'd be reasonable enough, but we'd like our issue tracker to reflect active or actionable work on the project itself.

It's really important for us to try to keep our issues tightly scoped. Changing priorities might mean that we sometime choose to reopen issues that we've previously closed. Right now I think this has fallen out of the "the core team want to address this in the immediate future".

If it comes up repeatedly, and there continues to be no third party solution, then perhaps we would reassess it.

leaving valid/acknowledged issues like this open.

A bit more context on the issue management style - https://www.dabapps.com/blog/sustainable-open-source-management/

Was this page helpful?
0 / 5 - 0 ratings