This is just a company advertisement, not even one that’s well done. They didn’t...

diptanu · 2025-11-06T19:20:47 1762456847

Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.

On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.

On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.

coderintherye · 2025-11-06T19:36:58 1762457818

Google's Vertex API for document processing absolutely does bounding boxes. In fact, some of the document processors are just a wrap around Google's product.

diptanu · 2025-11-06T19:48:55 1762458535

OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini

ianhawes · 2025-11-06T19:30:52 1762457452

I just tested a non-English document and it rendered English text. Does your model not support anything other than English?

diptanu · 2025-11-06T19:50:02 1762458602

It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.

hotpaper75 · 2025-11-06T19:21:00 1762456860

Thanks for mentioning them, indeed their post seem to only surface a couple of names in the field and maybe not the most relevant ones.

JLO64 · 2025-11-06T19:03:47 1762455827

Personally I use OpenAI models via the API for transcription of PDF files. Is there a big difference between them and Gemini models?