Usage

Constructing Specs

To begin using the dataspec library, you can simply import the dataspec.s object:

from dataspec import s

s() is a generic Spec constructor, which can be called to construct new Specs from a variety of sources. It is a singleton instance of dataspec.SpecAPI and nearly all of the factory or convenience methods below are available as static methods on s().

Specs are designed to be composed, so each of the spec types below can serve as the base for more complex data definitions. For collection, mapping, and tuple type Specs, Specs will be recursively created for child elements if they are types understood by s().

Specs may also optionally be created with Tags, which are just string names provided in dataspec.ErrorDetails objects emitted by Spec instance dataspec.Spec.validate() methods. For s(), tags may be provided as the first positional argument. Specs are required to have tags and all builtin Spec factories will supply a default tag if one is not given.

Validation

Once you’ve constructed your Spec, you’ll most likely want to begin validating data with that Spec. The dataspec.Spec interface provides several different ways to check that your data is valid given your use case.

The simplest way to validate your data is by calling dataspec.Spec.is_valid() which returns a simple boolean True if your data is valid and False otherwise. Of course, that kind of simple yes or no answer may be sufficient in some cases, but in other cases you may be more interested in knowing exactly why the data you provided is invalid. For more complex cases, you can turn to the generator dataspec.Spec.validate() which will emit successive dataspec.ErrorDetails instances describing the errors in your input value.

dataspec.ErrorDetails instances include comprehensive details about why your input data did not meet the Spec, including an error message, the predicate that validated it, and the value itself. via is a list of all Spec tags that validated your data up to (and including) the error. For nested values, the path attribute indicates the indices and keys that lead from the input value to the failing value. This detail can be used to programmatically emit useful error messages to clients.

Note

For convenience, you can fetch all of the errors at once as a list using dataspec.Spec.validate_all() or raise an exception with all of the errors using dataspec.Spec.validate_ex().

Warning

dataspec will emit an exhaustive list of every instance where your input data fails to meet the Spec, so if you do not require a full list of errors, you may want to consider using dataspec.Spec.is_valid() or using the generator method dataspec.Spec.validate() to fetch errors as needed.

Conformation

Data validation is only one half of the value proposition for using dataspec. After you’ve validated that data is valid, the next step is to normalize it into a canonical format. Conformers are functions of one argument that can accept a validated value and emit a canonical representation of that value. Conformation is the component of dataspec that helps you normalize data.

Every Spec value comes with a default conformer. For most Specs, that conformer simply returns the value it was passed, though a few builtin Specs do provide a richer, canonicalized version of the input data. For example, s.date() conforms a date (possibly from a strptime format string) into a date object. Note that none of the builtin Spec conformers ever modify the data they are passed. dataspec conformers always create new data structures and return the conformed values. Custom conformers can modify their data in-flight, but that is not recommended since it will be harder reason about failures (in particular, if a mutating conformer appeared in the middle of s.all(...) Spec and a later Spec produced an error).

Most common Spec workflows will involve validating that your data is, in fact, valid using dataspec.Spec.is_valid() or dataspec.Spec.validate() for richer error details and then calling dataspec.Spec.conform_valid() if it is valid or dealing with the error if not.

User Provided Conformers

When you create Specs, you can always provide a conformer using the conformer keyword argument. This function will be called any time you call dataspec.Spec.conform() on your Spec or any Spec your Spec is a part of. The conformer keyword argument for s() and other builtin factories will always apply your conformer as by dataspec.Spec.compose_conformer() , rather than replacing the default conformer. To have your conformer completely replace the default conformer (if one is provided), you can use the dataspec.Spec.with_conformer() method on the returned Spec.

Predicate and Validators

You can define a spec using any simple predicate you may have by passing the predicate directly to the s() function, since not every valid state of your data can be specified using existing specs.

spec = s(lambda id_: uuid.UUID(id_).version == 4)
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b")  # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9")  # False

Simple predicates make fine specs, but are unable to provide more details to the caller about exactly why the input value failed to validate. Validator specs directly yield dataspec.ErrorDetails objects which can indicate more precisely why the input data is failing to validate.

def _is_positive_int(v: Any) -> Iterable[ErrorDetails]:
    if not isinstance(v, int):
        yield ErrorDetails(
            message="Value must be an integer", pred=_is_positive_int, value=v
        )
    elif v < 1:
        yield ErrorDetails(
            message="Number must be greater than 0", pred=_is_positive_int, value=v
        )

spec = s(_is_positive_int)
spec.is_valid(5)      # True
spec.is_valid(0.5)    # False
spec.validate_ex(-1)  # ValidationError(errors=[ErrorDetails(message="Number must be greater than 0", ...)])

Simple predicates can be converted into validator functions using the builtin dataspec.pred_to_validator() decorator:

@pred_to_validator("Number must be greater than 0")
def _is_positive_num(v: Union[int, float]) -> bool:
    return v > 0

spec = s(_is_positive_num)
spec.is_valid(5)      # True
spec.is_valid(0.5)    # True
spec.validate_ex(-1)  # ValidationError(errors=[ErrorDetails(message="Number must be greater than 0", ...)])

Type Specs

You can define a Spec that validates input values are instances of specific class types by simply passing a Python type directly to the s() constructor:

spec = s(str)
spec.is_valid("a string")  # True
spec.is_valid(3)           # False

Note

s(None) is a shortcut for s(type(None)).

Factories

The s API also includes several Spec factories for common Python types such as bool, bytes, date, datetime (via s.inst()), float (via s.num()), int (via s.num()), str, time, and uuid.

s also includes several pre-built Specs for basic types which are useful if you only want to verify that a value is of a specific type. All the pre-built Specs are supplied as s.is_{type} on s. You can generate a more generic type-checking spec using Type Specs.

String Specs

You can create a spec which validates strings with s.str(). Common string validations can be specified as keyword arguments, such as the min/max length or a matching regex. If you are only interested in validating that a value is a string without any further validations, spec features the predefined spec s.is_str (note no function call required).

Numeric Specs

Likewise, numeric specs can be created using s.num(), with several builtin validations available as keyword arguments such as min/max value and narrowing down the specific numeric types. If you are only interested in validating that a value is numeric, you can use the builtin s.is_num or s.is_int or s.is_float specs.

UUID Specs

In a previous section, we used a simple predicate to check that a UUID was a certain version of an RFC 4122 variant UUID. However, dataspec includes the builtin UUID spec factory s.uuid() which can simplify the logic here:

spec = s.uuid(versions={4})
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b")  # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9")  # False

Additionally, if you are only interested in validating that a value is a UUID, the builting spec s.is_uuid is available.

Time and Date Specs

dataspec includes some builtin Specs for Python’s datetime, date, and time classes. With the builtin specs, you can validate that any of these three class types are before or after a given. Suppose you want to verify that someone is 18 by checking their date of birth:

spec = s.date(after=date.today() - timedelta(years=18))
spec.is_valid(date.today() - timedelta(years=21))  # True
spec.is_valid(date.today() - timedelta(years=12))  # False

For datetimes (instants) and times, you can also use is_aware=True to specify that the instance be timezone-aware (e.g. not naive).

You can use the builtins s.is_date, s.is_inst, and s.is_time if you only want to validate that a value is an instance of any of those classes.

Note

dataspec supports specs for arbitrary date strings if you have python-dateutil installed. See s.inst_str() for info.

Phone Number Specs

dataspec supports creating Specs for validating telephone numbers from strings using s.phone() if you have the phonenumbers library installed. Telephone number Specs can validate that a telephone number is merely formatted correctly or they can validate that a telephone number is both possible and valid (via phonenumbers ).

spec = s.phone(region="US")
spec.is_valid("(212) 867-5309")  # True
spec.conform("(212) 867-5309")   # "+12128675309"
spec.is_valid("(22) 867-5309")   # False

Email Address and URL Specs

dataspec features Spec factories for validating email addresses using s.email() and URLs using s.url().

Email addresses are validated using Python’s builtin email.headerregistry.Address class to parse email addresses into username and domain. For each of username and domain , you may validate that the value is an exact match, is one of a set of possible matches, or that it matches a regex pattern. To produce a Spec which only validates email addresses from gmail.com or googlemail.com:

spec = s.email(domain_in={"gmail.com", "googlemail.com"})
spec = s.email(domain_regex=r"(gmail|googlemail)\.com")
spec = s.email(domain="gmail.com")  # Don't allow "googlemail.com" email addresses

No more than one keyword filter may be supplied for either of username or domain.

URLs are validated using Python’s builtin urllib module to parse URLs into their constituent components: scheme , netloc , path , params , fragment , username , password , hostname, and port. URL Specs may optionally provide a Spec for the dict created by parsing the query-string (if present) for the URL. Specs for each of the components of a URL allow the same filters as described above for email addresses. For more information, see s.url().

Enumeration (Set) Specs

Commonly, you may be interested in validating that a value is one of a constrained set of known values. In Python code, you would use an Enum type to model these values. To define an enumermation spec, you can pass an existing Enum value into dataspec.s() :

class YesNo(Enum):
    YES = "Yes"
    NO = "No"

s(YesNo).is_valid("Yes")    # True
s(YesNo).is_valid("Maybe")  # False

Any valid representation of the Enum value would satisfy the spec, including the value, alias, and actual Enum value (like YesNo.NO).

Additionally, for simpler cases you can specify an enum using Python set s (or frozenset s):

s({"Yes", "No"}).is_valid("Yes")    # True
s({"Yes", "No"}).is_valid("Maybe")  # False

Collection Specs

Specs can be defined for values in homogenous collections as well. Define a spec for a homogenous collection as a list passed to dataspec.s() with the first element as the Spec for collection elements:

s([s.num(min_=0)]).is_valid([1, 2, 3, 4])  # True
s([s.num(min_=0)]).is_valid([-11, 2, 3])   # False

You may also want to assert certain conditions that apply to the collection as a whole. dataspec allows you to specify an optional dictionary as the second element of the list with a few possible rules applying to the collection as a whole, such as length and collection type.

s([s.num(min_=0), {"kind": list}]).is_valid([1, 2, 3, 4])  # True
s([s.num(min_=0), {"kind": list}]).is_valid({1, 2, 3, 4})  # False

Collection specs conform input collections by applying the element conformer(s) to each element of the input collection. Callers can specify an "into" key in the collection options dictionary as part of the spec to specify which type of collection is emitted by the collection spec default conformer. Collection specs which do not specify the "into" collection type will conform collections into the same type as the input collection.

Mapping Specs

Specs can be defined for mapping/associative types and objects. To define a spec for a mapping type, pass a dictionary of specs to s. The keys should be the expected key value (most often a string) and the value should be the spec for values located in that key. If a mapping spec contains a key, the spec considers that key required. To specify an optional key in the spec, wrap the key in s.opt(). Optional keys will be validated if they are present, but allow the map to exclude those keys without being considered invalid.

s(
    {
        "id": s.str("id", format_="uuid"),
        "first_name": s.str("first_name"),
        "last_name": s.str("last_name"),
        "date_of_birth": s.str("date_of_birth", format_="iso-date"),
        "gender": s("gender", {"M", "F"}),
        s.opt("state"): s("state", {"CA", "GA", "NY"}),
    }
)

Above the key "state" is optional in tested values, but if it is provided it must be one of "CA", "GA", or "NY".

Note

Mapping specs do not validate that input values only contain the expected set of keys. Extra keys will be ignored. This is intentional behavior.

Note

To apply the mapping Spec key as the tag of the value Spec, use s.dict_tag() to construct your mapping Spec. For more precise control over the value Spec tags, prefer s().

Mapping specs conform input dictionaries by applying each field’s conformer(s) to the fields of the input map to return a new dictionary. As a consequence, the value returned by the mapping spec default conformer will not include any extra keys included in the input. Optional keys will be included in the conformed value if they appear in the input map.

Merging Mapping Specs

Occasionally, you may wish to declare your mapping Specs across two or more different Specs. It may be convenient to do so for composition of common keys across multiple Specs. In such cases, you may naturally turn to one of the builtin Combination Specs to return a union of the input Specs. However, combination Specs composed of mapping Specs with disjoint or only partially intersecting key sets will end up producing unexpected results. Recall mapping Specs have a default conformer which drops keys not declared in the input Spec, so the chained conformation of s.all() will drop keys potentially expected by later Specs.

To merge mapping Specs, use s.merge() instead.

s.merge(
    {"id": int},
    {
        "id": lambda v: v > 0,
        "first_name": str,
        s.opt("middle_initial"): str,
        "last_name": str,
    },
)

In the above Spec, id would be a required key, which must be an integer greater than zero. Specs for the remaining keys would match the Spec defined in the second input Spec.

Note

Only mapping Specs may be merged. s.merge will throw a ValueError if you attempt to merge non-mapping type Specs. To combine mapping and non-mapping Spec types, you should wrap the mapping Specs with s.merge and pass that to s.all.

Key/Value Specs

Mapping Specs are useful for heterogeneous associative data structures for which the keys are known a priori. However, you may often wish to validate a homogeneous mapping with unknown keys. For such cases, you can turn to s.kv().

spec = s.kv(s.str(regex=r"[A-Z]{2}"), s.str(regex=r"[A-Z][\w ]+"))
spec.is_valid({"GA": "Georgia", "NM": "New Mexico"})  # True
spec.is_valid({"ga": "Georgia", "NM": "New Mexico"})  # False
spec.is_valid({"ga": "Georgia", "NM": "new mexico"})  # False

Note

By default s.kv will not conform keys on input values, to avoid potential creating potentially duplicate keys from the key conformer. You can override this behavior with the conform_keys keyword argument.

Tuple Specs

Specs can be defined for heterogenous collections of elements, which is often the use case for Python’s tuple type. To define a spec for a tuple, pass a tuple of specs for each element in the collection at the corresponding tuple index:

s(
    (
        s.str("id", format_="uuid"),
        s.str("first_name"),
        s.str("last_name"),
        s.str("date_of_birth", format_="iso-date"),
        s("gender", {"M", "F"}),
    )
)

Tuple specs conform input tuples by applying each field’s conformer(s) to the fields of the input tuple to return a new tuple. If each field in the tuple spec has a unique tag and the tuple has a custom tag specified, the default conformer will yield a namedtuple with the tuple spec tag as the type name and the field spec tags as each field name. The type name and field names will be munged to be valid Python identifiers.

Combination Specs

In most of the previous examples, we used basic builtin Specs. However, real world data often more nuanced specifications for data. Fortunately, Specs were designed to be composed. In particular, Specs can be composed using standard boolean logic. To specify an or spec, you can use s.any() with any n specs.

spec = s.any(s.str(format_="uuid"), s.str(maxlength=0))
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b")  # True
spec.is_valid("")            # True
spec.is_valid("3837273723")  # False

Similarly, to specify an and spec, you can use s.all() with any n specs:

spec = s.all(s.str(format_="uuid"), s(lambda id_: uuid.UUID(id_).version == 4))
spec.is_valid("4716df50-0aa0-4b7d-98a4-1f2b2bcb1c6b")  # True
spec.is_valid("b4e9735a-ee8c-11e9-8708-4c327592fea9")  # False

Note

and Specs apply each child Spec’s conformer to the value during validation, so you may assume the output of the previous Spec’s conformer in subsequent Specs.

Note

The names any and all were chosen because or and and are not valid Python since they are reserved keywords.

Warning

Using a s.all() Spec to combine mapping Specs for maps with disjoint or only partially intersecting keys will result in maps losing keys during conformation and failing validation in later Specs. Use s.merge() to combine mapping Specs. Read more in Merging Mapping Specs.

Utility Specs

Often when dealing with real world data, you may wish to allow certain values to be blank or None. We could handle these cases with Combination Specs, but since they occur so commonly, dataspec features a couple of utility Specs for quickly defining these cases. For cases where None is a valid value, you can wrap your Spec with s.nilable(). If you are dealing with strings and need to allow a blank value (as is often the case when handling CSVs), you can wrap your Spec with s.blankable.

spec = s.nilable("birth_date", s.str(format_="iso-date"))
spec.is_valid(None)          # True
spec.is_valid("1980-09-14")  # True
spec.is_valid("")            # False
spec.is_valid("09/14/1980")  # False, because the string is not ISO formatted

spec = s.blankable("birth_date", s.str(format_="iso-date"))
spec.is_valid(None)          # False
spec.is_valid("1980-09-14")  # True
spec.is_valid("")            # True
spec.is_valid("09/14/1980")  # False

In certain cases, you may be willing to accept invalid data and overwrite it with a default value during conformation. For such cases, you can specify a default value whenever the input value does not pass validation for another spec using s.default. The value supplied to the default keyword argument will be provided by the conformer if the inner Spec does not validate.

spec = s.default("birth_date_or_none", s.str(format=_"iso-date"), default=None)
spec.is_valid(None)          # True; conforms to None
spec.is_valid("1980-09-14")  # True; conforms to "1980-09-14"
spec.is_valid("")            # True; conforms to None
spec.is_valid("09/14/1980")  # True; conforms to None

Note

As a consequence of the default value, s.default(...) Specs consider every value valid. If you do not want to permit all values to pass, you should not use s.default.

Occasionally, it may be useful to allow any value to pass validation. For these cases s.every() is perfect.

Note

You may want to combine s.every(...) with s.all(...) to perform a pre- conformation step prior to later steps. In this case, it may still be useful to provide a slightly more strict validation to ensure your conformer does not throw an exception.