One of the more senior engineers I worked with told me: "Every real-life data structure I encountered was tree-like".
It would be easiest to just ask the browser to render a fragment of HTML onto a canvas, or onto some invisible bitmap, like you can with most other UI toolkits.
They would never do this because of fingerprinting, which is already the cause of most of the reasons we cannot 'just' do a lot of things, unfortunately.
E: And the infamous other half: malware. A bit over a decade ago malware devs started using canvas to do things like hide fragments inside of bitmap data in seemingly harmless ads and then a second script would extract and assemble it to evade detection.
The web platform can already do this, see SVG foreignObject elsewhere in the thread. The key is to have the proper bounds in place (cross origin resources, etc), and the infrastructure for that is already in place.
This just removes the extra step of relying on SVG to accomplish rendering the HTML, adds a path for getting this content into the accessibility tree, and supporting input on the rendered elements.
Yeah, that's already available in Firefox for chrome/extensions, but not allowed for the web due to fingerprinting and other security risks. For example, rendering an iframe of your bank account…
It would be easiest to just ask the browser to render a fragment of HTML onto a canvas, or onto some invisible bitmap, like you can with most other UI toolkits.