Depends on what you mean by "time". Going by algorithmic complexity time it's at least linear yes. Going by wall clock time (the more important one for interview code) it's constant time.
The time to spawn the threads is just inconsequential compared to the time to wait for the threads to exit for small sizes of N. But by doubling the number of numbers to sort you do double the time (actual wall clock time) that it takes to spawn those threads. O(N + 10000000) is still O(N)
Not to mention for the OS to handle N threads would likely take some kind of non-linear time, at least N^2 if I had to take a shot in the dark. But I imagine it'd depend on the OS and how the hardware handles interrupts. Even if the thread is sleeping it still takes resources to schedule it
You're forgetting that algorithmic complexity deals with asymptotic behavior.
An algorithm that takes N inputs, does N operations, and then sleeps for 10 seconds is not constant time, because in the asymptotic case the N iterations will take longer than the sleep.
It's constant with regards to the length of the input array, which is the usual metric for sort algorithm performance. That is, d(runtime)/d(len(input)) can be constant, even though d(runtime)/d(max(input)) is not, if you assume that len(input) is independent from max(input) (which is inconsistent with input having uniformly distributed random members, which is probably the more common assumption).
I don't see how it is. Different, but the same number, of elements is a different time. There's no constant time between [0, 1] and [0, 10], both having two elements.
The main "value" in the domain of sorting problems is the number of elements in your collection.
A subproblem also considers the elements to be integers, then they become another "value" domain. (But in general, sorting problems only need their elements to be comparable, not necessarily integers.)