I believe GP means that "a.b" should be enough for pointers as well, you don't n...

flohofwoe · on Oct 10, 2023

I really prefer that '.' and '->' remain separate, because it makes it immediately clear when a potentially expensive pointer derefence is happening:

    x = a->b->c->d->e;

...is a pointer-hunting-nightmare / potential cache-miss-galore.

    x = a.b.c.d.e;

...no problem, the whole chain is resolved at compile time into a single offset and results in at most one memory access.

    x = a.b.c->d.e;

...makes it immediately clear that c->d is a pointer access and everything else isn't.

PS: ...and of course C++ messed this simple rule up with the introduction of references.

nuancebydefault · on Oct 10, 2023

When stepping up from C to C++, it feels ultimately unlogical to use an ampersand to indicate a reference in a declaration, while in an assigment or actual parameter it means address of. Where they out of special chars?

adrian_b · on Oct 10, 2023

Sorry, I had posted a partial reply before finishing writing the complete one.

As I have written above, a language with implicit dereferencing needs additional syntactic means for denoting the cases when the values of the pointers are needed, e.g. for doing pointer arithmetic, which is frequent in C.

If dereferencing would have been made implicit, that would have required a large number of changes in the language. C++ has introduced pointers with implicit dereferencing, i.e. what C++ calls references, but due to the fact that the other syntax changes that are needed have not been made, because they would break compatibility, the C++ references can replace C pointers only in a subset of their uses.

tsimionescu · on Oct 10, 2023

No one here is proposing implicit dereferencing (everywhere), the proposal is just to make the syntax `pointer.field` have the exact same meaning that `pointer->field` has in C today. In all other places, the current syntax and semantics would stay the same.

Since in all situations where pointer->field is valid pointer.valid is an explicit error in C today, this would have been very much doable without major changes in compiler or language implementation.

Of course, at C's level of abstraction, and given the speed limitations of the day, the syntax difference between -> and . may well be argued to have been helpful instead of harmful.

And yes, the same could not be said for C++, where you can implement dereferencing for your own type and get a variable where both `a.b` and `a->b` are valid and have different meanings. This is anyway not a proposal for changing C (or C++) today, merely a "what if" discussion about how C could have been designed differently.

dataflow · on Oct 10, 2023

> If dereferencing would have been made implicit

You're misunderstanding the comment. They aren't asking for implicit derefs. They're asking for explicit derefs, using * or . instead of * or ->

psd1 · on Oct 10, 2023

Your and GP's make no sense unless the reader knows that * is a markdown directive and infers where you typed them.

Which is ironic when discussing overloading!

dataflow · on Oct 10, 2023

In confused, where in my comment does the Markdown syntax come into play? You're not seeing italics or backslashes are you?

psd1 · on Oct 10, 2023

I see now - Hacki renders italics, but the browser doesn't.

Excuse my confusion, this possibility didn't occur to me.

fanf2 · on Oct 10, 2023

Hacker News does not use Markdown.

masklinn · on Oct 10, 2023

While that’s technically true, it does use a severely cut down version thereof which is worse in every way: it only supports paragraphs, code blocks, and emphasis so the ability to format is extremely limited but the emphasis can still screw you over, which requires an explicit preview (or posting) to notice.

And I think escaping emphasis was only introduced somewhat recently? I do remember that for the longest time you basically had to trick HN into not breaking your comments by using a different character in stead.

OJFord · on Oct 10, 2023

No, but it does use:

for italics, which sometimes causes a comment to go haywire because someone uses it to reference a footnote or otherwise drops it in mid-sentence, with a matching closing one, not intending italics.

(Up-thread I think it probably was like that when they commented, but has since been fixed.)

wruza · on Oct 10, 2023

Btw, you can now(?) escape * in a paragraph with \*.

OJFord · on Oct 10, 2023

Yes, it's also automatic in some cases - originally my comment had an attempt at a joke example of it, but it was caught and didn't work. It was definitely improved relatively recently.

adrian_b · on Oct 10, 2023

With explicit derefs, if "*x.y" is taken to mean "(*x).y", then which is the meaning of "****x1.x2[7].x3.x4[9].x5"?

> "a.b" should be enough for pointers as well,

I have interpreted this to mean implicit deref, as there is no "*" (could be a formatting problem).

dataflow · on Oct 10, 2023

I think you're still misunderstanding the proposal. What are the types of your variables? This discussion makes no sense without type information. We're defining the meaning of . for pointers here. You seem to be misunderstanding the proposal as being purely a textual transformation that ignores types?

The proposal is literally: "If you see . and the left operand is a pointer, pretend the . was -> instead, because otherwise the code is already invalid."

adrian_b · on Oct 10, 2023

OK, I believe that you are right and if implicit dereferencing had been done only when this is the only interpretation that leads to a valid expression, then "->" would not have been necessary.

However, I assume that this would have been a too complex solution for compilers that had to work in a few tens of kilobytes of memory, while a postfix "*", as already used a decade before C, would have been a trivial solution.

masklinn · on Oct 10, 2023

> However, I assume that this would have been a too complex solution for compilers that had to work in a few tens of kilobytes of memory

It's literally the same complexity as the type checking compilers already do to tell you that your `.` does not work because the LHS is a pointer not a struct.

adrian_b · on Oct 11, 2023

It is not the same complexity, because syntax checking stops immediately at an error.

To determine if implicit dereferencing may be applied, more analysis has to be done, because there it may be not only a pointer to a structure, but a pointer to a pointer to a structure and so on, so multiple implicit dereferencing may be needed to obtain a valid expression.

However I agree that the difference in complexity is not big.

tsimionescu · on Oct 11, 2023

`a.b` is always valid syntax, there's no way to know if it is a valid C instruction until you resolve the types of a and b. Assigning it valid semantics at that point is exactly as easy as assigning it error semantics.

And no, this would not perform multiple levels of dereferencing anymore than -> does today. You could have literally find&replaced every use of -> with . and every C program would have had the exact same semantics. `struct point **a; a.x = 1` would throw the exact same compilation error that `struct point **a; a->x = 1` throws today. The only difference would be that `struct point *a; a.x = 1;` would write 1 to the field x of the object pointed to by a, instead of throwing an error that says "object of type struct point* has no field named x".

loup-vaillant · on Oct 10, 2023

> As I have written above, a language with implicit dereferencing needs additional syntactic means for denoting the cases when the values of the pointers are needed, e.g. for doing pointer arithmetic, which is frequent in C.

No it doesn’t. Let’s take this example in valid C:

  struct my_struct { int field };
  struct my_struct  s = { .field = 42 };
  struct my_struct *p = &s;
  printf("direct : %d", s.field);
  printf("pointer: %d", p->field);
  printf("address: %x", p);

What is being proposed here is to make that code valid:

  struct my_struct { int field };
  struct my_struct  s = { .field = 42 };
  struct my_struct *p = &s;
  printf("direct : %d", s.field);
  printf("pointer: %d", p.field); // note the use of a dot here
  printf("address: %x", p);

That is, the naked p is still to be interpreted as what it is: a pointer. It’s just that when we write `a.b`, the language would first check the type of `a`, then dereference it as many times as necessary to get to the underlying struct, and then access its field. For instance:

  struct my_struct { int field };
  struct my_struct    s   = { .field = 42 };
  struct my_struct   *p   = &s;
  struct my_struct  **pp  = &p;
  struct my_struct ***ppp = &pp;

Now let’s see how this automatic indirection would work:

  // All would print the same value
  printf("%d", ppp.field);
  printf("%d", pp .field);
  printf("%d", p  .field);
  printf("%d", s  .field);

  // We can still use explicit indirections
  printf("%d", (*ppp  ).field);
  printf("%d", (**ppp ).field);
  printf("%d", (***ppp).field);
  printf("%d", (*pp   ).field);
  printf("%d", (**pp  ).field);
  printf("%d", (*p    ).field);

We can still get to the actual addresses no problem:

  printf("p: %x", p);
  printf("p: %x", *pp);
  printf("p: %x", **ppp);

  printf("pp: %x", pp);
  printf("pp: %x", *ppp);

  printf("ppp: %x", ppp);

Note that we can play with the & operator too. In valid C we can do this already:

  printf("s.field: %d", s.field);
  printf("s.field: %d", (*&s).field);
  printf("s.field: %d", (**&&s).field);
  printf("s.field: %d", (***&&&s).field);

With automatic indirection the following would be valid too:

  printf("s.field: %d", (&s).field);
  printf("s.field: %d", (&&s).field);
  printf("s.field: %d", (&&&s).field);

---

The kicker here is that the decision on whether an access to a struct member requires dereferencing the pointer or not, is not done at parsing time. It’s done at type checking time. And by the way, in standard C the decision to give you an error or not is already done at type checking time. All this to say, this would be a fairly benign change to compilers.

Now would users get confused? Possibly. With the conflation of pointers and arrays, the following would be equivalent:

  array.field
  (*array).field
  array[0].field

Looks nifty to some perhaps, but some people really meant:

  array[i].field

and forgot to write the index.