OpenRouter is the leading place to go to to get general purpose models of all sorts. It's fairly popular, and processes tens of trillions of tokens a year.
OpenRouter is valued at >$500m and processes >$100m/year, 5% of which goes to them. Not that large compared to e.g. OpenAI, but it's the largest that doesn't produce its own models & with the largest selection I'm aware of.
Validating during parsing is still parsing, there's a reason why `Alternative f` exists after all: you have to choose between branches of possibilities and falsehoods. Now consider that there's another kind of validation that happens outside of program boundaries (where broader-than-needed data is being constrained in a callee rather than the calling site) that should've been expresed as `Alternative f` during parsing instead. That's the main point of the article, but you seem to only focus on the literal occurence of the word "validation" here and there.
So you are saying that if at a certain point in parsing the only expected terms are 'a', 'b' and 'c', one should not put the corresponding parsed entry in a `char` (after checking it is either of these aka validating), and instead it should be put in some kind of enum type (parsed via `Alternative f`). Right?
You put them however you like, be it in a char or a vector of, but the bottom line is that your parsed items are part of the "sanitized" label that allows you to either tuple-unpack or pattern-match (as long as it's near or literally zero-cost) without performing the same validation ever again for the lifetime of the parsed object. The callees that exclusively expect 'a', 'b' and 'c', and for which they perform internal validation step, should be replaced with versions that permit the inputs with sanitized labels only. How you implement the labels depends on the language at hand, in case of Haskell they can be newtypes or labelled GADTs, but the crucial part is: the "validation" word is systematically pushed to the program boundaries, where it's made part of the parsing interface with `Alternative f` and sanitization labels acting on raw data. In other words you collapse validation into a process of parsing where the result value is already being assembled from a sequence of decisions to branch either with one of the possible successful options or with an error.
> but the crucial part is: the "validation" word is systematically pushed to the program boundaries
Yea, so again. Isn't that freaking obvious?! That author seem to be experienced in Haskell where this kind of thing is common knowledge and for some reason this seems to be some kind of revelation to them...
apparently not, as I always find snippets of patterns of this kind from my coworkers (and I've worked in many companies, including the ones that require precision for legal compliance):
def do_business_stuff(data):
orders = data.get("orders")
if not orders:
return
for order in orders:
attr = order.get("attr")
if attr and len(attr) < 5:
continue
...
The industry's awareness baseline is very low, and it's across tech stacks, Haskell is no exception. I've seen stuff people do with Haskell at 9 to 5 when the only thing devs cared about was to carry on (and preferably migrate to Go), and I wasn't impressed at all (compared to pure gems that can be found on Hackage). So in that sense having the article that says "actually parse once, don't validate everywhere" is very useful, as you can keep sending the link over and over again until people either get tired of you or learn the pattern.
> They help define and navigate the representation of composite things as opposed to just having dynamic nested maps of arbitrary strings.
What would you say to someone who thinks that nested maps of arbitrary strings have maximum compatibility, and using types forces others to make cumbersome type conversions?
If the fields of a structure or the string keys of an untyped map don't match then you don't have compatibility either way. The same is not true for restricting the set of valid values.
edit: To put it differently: To possibly be compatible with the nested "Circle" map, you need to know it is supposed to have a "Radius" key that is supposed to be a float. Type definitions just make this explicit. But just because your "Radius" can't be 0, you shouldn't make it incompatible with everything else operating on floats in general.
I'd wager a fair share of grads don't yet understand:/
> The transaction didn't help. Postgres's default isolation level is READ COMMITTED — each statement sees all data committed before that statement started.
> work that could be done efficiently by the RDBMS
the RDBMS? You're only using one? Why not spread the work out a little? Even if you think you write all your queries efficiently, nothing stops your teammates from DOS'ing your efficient queries by writing inefficient queries themselves. Last week our team started started piling up write timeouts because another team was modifying one of their tables. Not in their db, in the db.
> Queries become much more convoluted
Please, every ounce of effort invested in ORMs like EF/LINQ is to make code look less like querying and more like plain old object access. For the most part, devs want to work with objects and store objects. If you didn't go the RDBMS route, you wouldn't need EF/LINQs help in decomposing your objects and scattering their parts into separate tables. The least convoluted query possible is to just grab the object you wanted directly.
- “Value capture,” as called out in the article. If new tools make engineers 10x more productive, that should be reflected in compensation
- End employment law workarounds like “unlimited PTO,” where your PTO is still limited in practice, but it’s not a defined or accruing benefit
- Protection against dilution of equity for employees
- A seat at the table for workers, not just managers, in the event of layoffs
- Professional ethics and whistleblower protections. Legally-protected strikes if workers decide to refuse on pursuing an ethically or legally dubious product or feature.
I could go on. There are a lot of abuses we put up with because of relatively high salaries, and it is now abundantly clear that the billionaire capital-owning class is dead set on devaluing the work we do to “reduce labor costs.” We can decide not to go along with that.
Suppose you're receiving bytes representing a User at the edge of your system. If you put json bytes into your parser and get back a User, then put your User through validation, that means you know there are both 'valid' Users and 'invalid' Users.
Instead, there should simply be no way to construct an invalid User. But this article pushes a little harder than that:
Does your business logic require a User to have exactly one last name, and one-or-more first names? Some people might go as far as having a private-constructor + static-factory-method create(..), which does the validation, e.g.
class User {
private List<String> names;
private User(List<String> names) {..}
public static User create(List<String> names) throws ValidationException {
// Check for name rules here
}
}
Even though the create(..) method above validates the name rules, you're still left holding a plain old List-of-Strings deeper in the program when it comes time to use them. The name rules were validated and then thrown away! Now do you check them when you go to use them? Maybe?
If you encode your rules into your data-structure, it might look more like:
class User {
String lastName;
NeList<String> firstNames;
private User(List<String> names) throws ValidationException {..}
}
If I were doing this for real, I'd probably have some Name rules too (as opposed to a raw String). E.g. only some non-empty collection of utf8 characters which were successfully case-folded or something.
Is this overkill? Do I wind up with too much code by being so pedantic? Well no! If I'm building valid types out of valid types, perhaps the overall validation logic just shrinks. The above class could be demoted to some kind of struct/record, e.g.
record User(Name lastName, NeList<Name> firstNames);
Before I was validating Names inside User, but now I can validate Names inside Name, which seems like a win:
class Name {
private String value;
private Name (String name) throws ValidationException {..}
}
If you spend your life talking about bool having two values, and then need to act as if it has three or 256 values or whatever, that's where the weirdness lives.
In C, true doesn't necessarily equal true.
In Java (myBool != TRUE) does not imply that (myBool == FALSE).
Maybe you could do with some weirdness!
In Haskell:
Bool has two members: True & False. (If it's True, it's True. If it's not True, it's False).
Unit has one members: ()
Void has zero members.
To be fair I'm not sure why Void was raised as an example in the article, and I've never used it. I didn't turn up any useful-looking implementations on hoogle[1] either.
What were you expecting to find? A function which returns an empty type will always diverge - ie there is no return of control, because that return would have a value that we've said never exists. In a systems language like Rust there are functions like this for example std::process::exit is a function which... well, hopefully it's obvious why that doesn't return. You could imagine that likewise if one day the Linux kernel's reboot routine was Rust, that too would never return.
My UserService doesn't know that it's talking to a UserDB (ironically I learned this from Uncle Bob).
All UserService knows is it has a dependency which, if passed a UserId, will return a User (+/- whatever the failure mode is .. Future<User>? Promise<User>? ReaderT m User?)
When I change my mind about what UserService requires or what UserDB provides (which I frequently do), I immediately look at all the red underlines that my compiler & static types tell me about.
Is this like Windows and MacOS not being in the top 10 of distrowatch.com?
reply