Declarative Validation in Python

hprotagonist · on May 4, 2021

pydantic does a very tidy job at this:

https://pydantic-docs.helpmanual.io/usage/validators/

_6pvr · on May 4, 2021

> def validate_age(age, errors):

> if not isinstance(age, int):

> errors.append("age must be an int")

I can't really get past the concept of writing tests to validate types like this. What's the point of using a dynamically typed language just to perform typechecking in tests? Why not just use a typed language at that point?

the_duke · on May 5, 2021

The post shows a style of writing validations for untrusted input. Like form data or incoming API requests.

The problem is orthogonal to the type of language used, you have to validate and convert data in Rust just as much as you have to in Python.

nerdponx · on May 5, 2021

If anything, static typing forces you to write these validations in the process of deserializing data, otherwise you couldn't construct instances of the necessary types in the first place.

stavros · on May 4, 2021

Or why not just use the optional types that have been there for a decade?

    def validate_age(age: int, errors: List[Error]):

Install Typeguard if you want runtime error checking, otherwise use mypy. Done.

_6pvr · on May 4, 2021

Sure - I'm not really familiar with the Python ecosystem or the status of typed python. It seems like we mostly agree that building a typechecking system via unit tests isn't an excellent use of time, though. I was curious if maybe I'd missed something, but it seems that Python has provided tooling to explicitly not do what the post was describing for a quite a while, so maybe not.

stavros · on May 4, 2021

Oh yeah, checking types the way the author was doing it is explicitly an antipattern. If you want to do duck typing, you should check that the object acts like you expect (e.g. that it has a defined "addition" operation, if you want to add it to something), rather than that it's of a specific type.

If you want to do that, you should use the static type system.

rbanffy · on May 5, 2021

You may want to explicitly test for a type depending on the interface contract. If you build software for consumption by others, you may want to ensure that, whatever the inputs, it’ll always return something you said would be returned. While the consumer would be insane not to verify what you sent, it’s polite not to send a complex number when someone is expecting a count.

stavros · on May 5, 2021

Certainly, but you're typically much more in control of what you send than what you receive.

utucuro · on May 6, 2021

The post is not about what you're sending, it is about making sure what you're receiving is what you're getting. The motivation begins with: "Many of our programs accept input from the user. Often we need to validate this input before continuing processing and, in the case of errors, inform the user of any problems."

stavros · on May 6, 2021

The post I was replying to is about what you send, though.

rbanffy · on May 6, 2021

You may also want to do sanity checks but, in those cases, I tend to deliberately abuse the inputs. Send a string, a byte string, some invalid UTF-8, an emoji, an invoice, and ensure that the thing can take more abuse than what the manual says.

If, however, the manual says “integer” then it should complain loudly the user is breaking the contract.

stavros · on May 6, 2021

> If, however, the manual says “integer” then it should complain loudly the user is breaking the contract.

Yep, that's why I prefer mypy/typeguard and type annotations for those sorts of things over checking in the body. It's frequently much, much easier to read, and self-documenting, since most IDEs have support for that when calling the function nowadays.

_6pvr · on May 4, 2021

Gotcha - that makes sense and is what I would've expected. Thanks for the info!

geofft · on May 4, 2021

The way the post presents things is a little bit confusing. It does go on to present actual validation, which you'd still need to do in most typed languages, like checking for age < 10.

However, despite talking about getting "input from the user," but then it accepts in-language objects and validates them. If it were accepting, say, strings in all cases, e.g.

    def validate_age(raw_age: str, errors: List[str]) -> int:
        try:
            age = int(raw_age)
        except ValueError as e:
            errors.append(str(e))
        if age < 10:
            errors.append("age must be at least 10")
        return age

then I think the listed approach would make sense. It's not doing isinstance checks - it's doing conversions, which you'd have to do regardless.

But accepting an object that should be an integer, and then checking that it actually is an integer, doesn't seem like input from the user:

    def validate_age(age: int, errors: List[str]) -> int:
        if not isinstance(age, int)
            raise ValueError("Did someone forget to run mypy???")
        ...

I suppose the approach mentioned in this article would be valid if the input were JSON or something. Then you really would have language-level objects of potentially int, str, or other types, and you'd need to check which particular type you have at runtime, and the function would legitimately be age: Any, not age: int.

Also, if you were really embracing the type system, you'd return a ValidAge object, a newtype around an integer at least 10 (for this application), instead of just an int. That would let you statically make sure that any ValidAge has already been checked. I think you can mostly emulate this in Python with

    ValidAge = typing.NewType("ValidAge", int)

    def validate_age(raw_age: str, errors: List[str]) -> ValidAge:
        ...
        return ValidAge(age)

except that I don't think Python gives you the privacy features to prevent someone else from constructing a ValidAge.

BTW, your post made me realize why I don't like "types" as runtime programming-by-contract: the whole point of a type system is that it's Turing-incomplete, and therefore you can prove things about it statically (i.e., at/near compile time, not at runtime) without having to actually execute the program. Validation - accepting an unknown input from outside of the program (outside of what's visible to the type checker, which is generally just one program's source code) - is a task you must do at runtime, and therefore is something for whose runtime failure you already want to write error handling. But regular old type checking for proper use of APIs is not, and writing any sort of runtime error handling for its failure always feels weird, because you shouldn't report the error to the user, you should report the error to the developer.

wodenokoto · on May 5, 2021

Because it isn't for type checking, it is for validating data and producing a list of validation errors (not stopping at first error)

Those are two things that type checking won't give you.