Django-filter: Filter for contrib.postgres JSONField

Created on 4 Jun 2016  ·  16Comments  ·  Source: carltongibson/django-filter

This started in the Google discussion group:
https://groups.google.com/forum/#!topic/django-filter/RwNfoWsdeLQ

I'm interested in being able to filter contrib.postgres JSONFields with django-filter.

I have a filter that is working for a few examples. This is more complicated than I thought in that you don't really know the type of the data in your JSON ahead of time the way you do with something like an IntegerField. I may just be making it overly complicated.

Here's an example filter that hits my JSONField
http://127.0.0.1:8000/api/v1/craters?data=latitude:float:-57:lte~!@!~age:str:PC

Here's the models and filter code:
https://gist.github.com/jzmiller1/627071f555186cd1a58bb8f065205ff7

I'm going to continue messing around with it. If anyone has any thoughts or feedback please let me know...

Most helpful comment

I think it would be really amazing to have a JSONFilter enabling queries such as jsonfield__a_random_key=value. I know you can do it with the objects.filter method. Perhaps the tradeoff could be the filter validation?

All 16 comments

Hi @jzmiller1. What exactly are you trying to achieve? Can't tell if you're:

  • trying to create a generic JSONFilter that will allow you to query any arbitrary attribute inside a JSONField. or,
  • trying to expose specific attributes (latitude, age) that are common to your crater data. These attributes would essentially be your data's schema.

The former is interesting, but as you've found out the complication lies in that the JSONField is inherently schemaless. Without a schema, you can't write code to automatically generate filters. Your MethodFilter works in that it allows any arbitrary attribute lookup, but you're unable to validate those lookups. eg, ?data=latitude:char:PC:isnull is possible, but nonsensical. Whatever solution here is going to require a tradeoff. A completely arbitrary filter won't be able to validate the lookups, a validating filter would require some way of providing a schema.

For the second case the solutions is verbose/tedious, but straightforward.

class CratersFilter(filters.FilterSet):
    latitude = filters.NumberFilter(name='data__latitude', lookup_expr='exact')
    latitude__lt = filters.NumberFilter(name='data__latitude', lookup_expr='lt')
    latitude__gt = filters.NumberFilter(name='data__latitude', lookup_expr='gt')
    latitude__isnull = filters.BooleanFilter(name='data__latitude', lookup_expr='isnull')
    # not sure if 'isnull' is a valid lookup for JSONFields - just demonstrating that 
    # different lookups expect different value types.

    age = filters.CharFilter(name='data__age', lookup_expr='exact')
    ...

Your query would then look like:

http://127.0.0.1:8000/api/v1/craters?latitude__lte=-57&age=PC

My goal is to create a generic JSONFilter that will allow queries on any arbitrary attribute inside a JSONField. For what I'm working on I won't really know whats inside the data for a particular crater but if there is a key in there that I am looking for I'd like to be able to query it.

As far as the inability to validate the lookup types goes I think I would be depending on the user making the query to realize that the query is nonsensical and just not make it to begin with.

I'm not sure if what I'm trying to do is a waste of time or not. There may be a better solution out there for what I'm trying to accomplish. I was curious if anyone saw major issues that would prevent this from being possible or if anyone had a use case where this would be useful. Thanks for taking a look at it!

My first thought is about the validation, as per @rpkilby's point. Schema-less is nice from a developer point of view — but I'm not sure you want it wired up directly to addressable URLs.

Lets keep this open for now. I can see it being a popular request. (So even to address it at the level of _"Here's an example MethodFilter"_ in the docs would be worthwhile.)

For what I'm working on I won't really know whats inside the data for a particular crater but if there is a key in there that I am looking for I'd like to be able to query it.

This seems... kind of funky. You're providing an API for crater data, but you don't know what's in the data that you're providing? Do you mean that some records will be missing attributes of a common schema, or that individual records are completely arbitrary?

I'm going to close this as Out of Scope for the moment. Happy to consider documented, tested pull requests. We may have capacity to reconsider in the future.

I think it would be really amazing to have a JSONFilter enabling queries such as jsonfield__a_random_key=value. I know you can do it with the objects.filter method. Perhaps the tradeoff could be the filter validation?

Just completed an implementation of 'natural' query for QuerySet filter using Q object. It's been unit tested against queryset with ~1000 records using a JsonField. Implementation is at:
https://github.com/shallquist/DJangoQuerySetFilter/blob/master/queryparser.py

Hey @shallquist im not sure how to use QuerySetFilter in the context of django-filter. Did you document usage somewhere?

It's quite simple to use as is shown in the readme on github. Normal queries should be supported, ie.
QuerySetFilter('friends').get_Query((person__address__city = Denver | person__address__city = Boulder) & person__address_state ~= CO)
which will build a query to retrieve all friends who live in Dever or Boulder colorado, where friends is a jsonfield.

BTW This hasn't been tested much and since Django filters do not support querying embedded array objects, I've abandoned this approach.

https://github.com/carltongibson/django-filter/issues/426#issuecomment-380224133

I think it would be really amazing to have a JSONFilter enabling queries such as jsonfield__a_random_key=value. I know you can do it with the objects.filter method. Perhaps the tradeoff could be the filter validation?

Hey @carltongibson, @rpkilby I'd like to get your thoughts on this. Let's say my_field is a postgres JSONField, and I want to:

  • Add a REST filter in the shape of my_field__etc=value where etc is any of the queries supported by JSONField and value whatever the REST user provides.
  • Then I'd like to pass etc and value to the model objects manager in the form of MyModel.objects.filter(my_field__etc=value).
  • Finally retrieve whatever the filter returns.

It seems like super trivial but I haven't figured out for to go about something like this. If you guys give me a bit of a clue I could try to implement it.

Any thoughts would be super appreciated!

Does something like the following not work?

class MyFilter(FilterSet):
    my_field__etc = filters.NumberFilter(field_name='my_field', lookup_expr='etc')

In general, the field_name should match the underlying model field name, while transforms and lookups (a key transform in this case) should be contained in the lookup_expr.

@rpkilby thanks so much for such a quick response - Yeah exactly, but I want etc to by provided by the user in the request... So I couldn't really hardcode it in the filter 💭

The filter should look more like:

class MyFilter(FilterSet):
    my_field = JSONFieldFilter(field_name='my_field')

So, a single JSON filter to handle arbitrary query params like ?my_field__etc=value.

I see two problems. First, part of the value of django-filter is that it validates the query params. Because JSONFields have no schema, it's not possible to generate filters that appropriately validate the inbound data. eg, if your JSON field has a "count" key, it wouldn't be possible to intuit that only positive numbers are valid. The best that could be done is guarantee the value is valid JSON. So the queries would at least be valid, but possibly nonsensical (e.g., data__count__gt='cat').

The second is that this filter is going to have the same limitations as MultiWidget-based filters. e.g., it will not generate validation errors for the correct param names. But before diving into that, here's how I'd probably implement the filter. We need:

  • A filter class to perform the actual filtering, which should handle multiple params
  • A form field to validate the JSON data
  • A widget to get the data for the arbitrary my_field__* params.
class JSONWidget(widgets.Textarea):
    """A widget that handles multiple parameters prefixed with the field name."""

    def value_from_datadict(self, data, files, name):
        prefix = f'{name}{LOOKUP_SEP}'

        # this is doing two things: 
        # - matches multiple params for the base field name
        # - in addition to returning the value, we also need the full parameter name
        #   for querying. otherwise, values will be filtered against the base `name`. 
        return {k: v for k, v in data.items() if k.startswith(prefix)}

    def get_context(self, name, value, attrs):
        # to support rendering the widget, you would need to generate subwidgets
        # similar to MultiWidget.get_context.
        pass

class JSONField(postgres.forms.JSONField):
    widget = JSONWidget

    def clean(self, value):
        # note that it's not possible to collect/reraise any validation errors under
        # their actual parameter names. `form.add_error` should be used here, however
        # the field class does not have access to the form instance. raising 
        # ValidationError({k: str(original_exc)}) also does not work. 

        # clean/convert each value
        return {k: super().clean(v) for k, v in value.items()}

class JSONFilter(filters.Filter):
    field_class = JSONField

    def filter(self, qs, value):
        if value in EMPTY_VALUES:
            return qs
        return qs.filter(**value)

I haven't tested the above, but it should be roughly correct. However, there are limitations:

  • As far as I can tell, there's no way to correctly handle per-param ValidationErrors
  • Poor OpenAPI/CoreAPI schema support? Not sure what this would look like.
  • djangorestframework-filters isn't compatible with MultiWidget. This filter/widget would run into the same issues for the same reasons.

@rpkilby thanks so much for this thorough response.

if your JSON field has a "count" key, it wouldn't be possible to intuit that only positive numbers are valid.

This is a great point, the fact that we can't validate the type of query value makes it very challenging as MyModel.objects.filter(data__count="1") won't return the same as MyModel.objects.filter(data__count=1). As you say, there is no way to guess the type of the value from the query parameters.

Hence leaving only the option to embed the type info in the query value, doing something like ?data__count=1:int to search for integer and ?data__count=1:str for strings and so on. But as its suggested here, this is not recommended.

I now understand why it is so much valuable to explicitly define the filters. Nevertheless, I'll give it a try to your suggestion! Thanks again

@rpkilby , I have a similar need.

I've a config table like this with two columns

meta_structure of type jsonb (This column has info like key1 of type string, key2 of type integer)

I've another table named config_data which will has 3 columns.

config_id -> Foreign key to config table
meta_info -> jsonb type

Note: The tables mentioned above are not the exact tables. They're just the representative versions to convey the message.

I currently validate the fields in meta_info table before save by checking it's match from the config table.

The need is that I want to filter using the meta_info column of the config_data table. Eg. meta_info__key1='abc'. (key1 can be anything)

I was trying to use the approach which you had given above but the problem is how do I use the JSONFilter class which you've created above.

Eg.

class ConfigDataFilterSet(django_filters.FilterSet):
    meta_info = JSONFilter(field_name='meta_info')

pp = ConfigDataFilterSet(data={'meta_info__key1': 'abc'})

Now, If I run pp.qs or pp.filter_queryset() it won't actually apply the filter on the meta_info field because the field name assigned in the ConfigDataFilterSet class is meta_info. Can you help me out to overcome this hurdle?

Was this page helpful?
0 / 5 - 0 ratings