C uses "&" for the address-operator because 'ampersand sounds like "address"'

adrian_b · on Oct 10, 2023

Some people in that discussion wonder about "->", which is used for indirect addressing through a structure member.

When C has added structures, which did not exist in B, it has taken the keyword "struct" and also both "." and "->" from the IBM PL/I language, from which C has also taken some other features.

In general, almost any feature added by C to B was taken either from PL/I or from Algol 68. The exceptions are "continue" and the generalized "for", which did not exist in any previous language.

(However the generalized "for" of C was a mistake, because it complicates the frequent use cases in order to simplify seldom encountered use cases. The right way to generalize "for", i.e. with iterators, was introduced by Alphard in the same year with C, i.e. in 1974.) (Compare "for (I=0;I<N;I+=5) {" of C with "for I from 0 to N by 5 do" or "for I in 0:N:5 do" of previous languages. C requires typing a lot of redundant characters in the most frequent cases.)

The oldest symbol for indirection through a pointer (in the language Euler, in January 1966) was a raised middle dot (i.e. a point). This was before ASCII and ASCII did not include the raised middle dot (U+00B7), so it was replaced by the most similar ASCII character, "*".

Euler had used "@" for "address of" and indirection was a postfix operator, as it should. Making "*" a prefix operator in B and C was a mistake, which forced the importing of "->" from PL/I, to avoid an excessive number of parentheses. Otherwise "(*x).y" would have been needed, instead of "x->y". With a postfix "*", that would have been "x*.y", and "->", would not have been needed.

In CPL, the ancestor of BCPL and B, indirection was implicit, like with the C++ references. Instead of having an "address of" operator, CPL had a distinct symbol for an assignment variant that assigns the address of a variable, instead of assigning its value.

tralarpa · on Oct 10, 2023

> the generalized "for" of C was a mistake, because it complicates the frequent use cases in order to simplify seldom encountered use cases

The worst offender is the quasi-obligatory "break" statement in "case" blocks. Fall-through cases are useful for things like lexers but probably not needed in 99.999% of other programs. I wonder how many millions of (wo)man hours have been wasted on debugging missing breaks. (yes, I know that linters exist)

lisper · on Oct 10, 2023

> The worst offender is the quasi-obligatory "break" statement in "case" blocks.

This was (almost certainly) done to simplify the compiler. CASE in C is not actually a structured control construct despite its syntactic appearance, it's just a computed GOTO. This is what makes things like Duff's device [1] possible.

[1] https://en.wikipedia.org/wiki/Duff%27s_device

bxparks · on Oct 10, 2023

I think computed-GOTO is easier to understand and more flexible than Duff's device, but computed-GOTO is not part of the C standard. It is sometimes available as a non-portable extension, e.g. https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html. Anyone know why it is not included in C?

lisper · on Oct 10, 2023

Computed go to was not quite an accurate description. I probably should have said “dispatch table” instead.

codr7 · on Oct 10, 2023

Yeah, I remember having my mind blown when I first discovered Duff's Device.

My mental picture of what C-compilers are up to turned out to be a lot more sophisticated than reality.

But having written a few Forth-interpreters, it makes total sense to me. Treating the input as a mostly unstructured stream of tokens is very convenient.

nuancebydefault · on Oct 10, 2023

What I find inconvenient is that a case body can be made with or without curly brackets. I'd rather have they were mandatory, like when an if statement aplies to more than one statement. The downside is that you need more indentation to have a logical curly brackets structure.

Choices/options for the same result, tend to make code less readable.

itsphilos · on Oct 10, 2023

I didn't know about Duff's device, which looks pretty interesting! Thanks for sharing

lisper · on Oct 10, 2023

Sure, but just for the record IMHO duffs device is an abomination and an illustration of how badly broken C is. No one should ever actually use it.

commandersaki · on Oct 11, 2023

I think I've come across Duff's device in Go internals for zero-ing memory.

WalterBright · on Oct 10, 2023

In D, if there are no intervening statements:

    case 1:
    case 2:
       ... code ...

the fallthrough is allowed.

    case 1:
        ... code ...
    case 2:
        ... code ...

Is an error. But `goto case` can be used:

    case 1:
        ... code ...
        goto case;
    case 2:
        ... code ...

which makes it clear. No new keywords are needed. C could adopt this easily. All those /* fallthrough */ comments and warnings and compiler switches will just go away.

Oh, and obviously:

    case 1:
        ... code ...
        goto default;
    default:
        ... code ...

also works!

kps · on Oct 10, 2023

That (also) came from BCPL. (BCPL had a separate `endcase` statement; `break` applied only to loops. It also had `docase ‹expr›`, which is the equivalent of a loop `continue`, re-entering the `switchon`.)

vincent-manis · on Oct 10, 2023

As I recall, B was modelled on an early BCPL compiler that didn't have an ENDCASE statement, hence the overloading of break for exiting a switch block.

bluecalm · on Oct 10, 2023

You don't really need a linter. It's just a warning setting, see:

  -Wimplicit-fallthrough

From the compiler docs. It's just not a problem. If you waste time once and learn about warning options in the compiler you will be better off anyway. It's like with assignment being an expression panic - something that maybe bites you once and then you learn to read the compiler warnings and it never causes problems again. If you don't want to read then there is:

  -Werror

Available.

flohofwoe · on Oct 10, 2023

No separate linters needed, but details differ between compilers.

GCC has a warning in the '-Wextra' warning set, Clang requires the explicit option `-Wimplicit-fallthrough', MSVC is completely silent (apparently it's in the CppCoreCheck rules though).

This should really be in the default warning set with an annotation that the fallthrough is intended (unless the case-branch is completely empty).

cassepipe · on Oct 10, 2023

Note that gcc and clang will warn you about fallthrough statements i you use -Wextra and that you have to either have insert a /fallthrough/ comment to make your intent clear.

Pity the (wo)man who does not use -Wall -Wextra for (s)he is truly mistaken

flohofwoe · on Oct 10, 2023

Clang actually doesn't warn in -Wextra:

https://www.godbolt.org/z/99ejvY9Pc

It requires the separate option -Wimplicit-fallthrough:

https://www.godbolt.org/z/GP576zG9h

(which tbh sounds like a bug, because usually Clang tries to emulate GCC behaviour)

a_e_k · on Oct 10, 2023

C++17 now has standardized the [[fallthrough]] attribute:

https://en.cppreference.com/w/cpp/language/attributes/fallth...

(For backwards compatibility, it still must fallthrough with or without the attribute; the attribute just signals programmer intention and silences the warning.)

pjmlp · on Oct 10, 2023

Incidently lint was invented exactly because of improving the way C was, they decided it was better in the UNIX tradition to add yet another tool, which those millions rather not use.

reactordev · on Oct 10, 2023

break; has a special color value in my IDE’s. Just so it’s loud. Strictly because of what you describe.

AnimalMuppet · on Oct 10, 2023

That's not a bad idea, but it still leaves you having to notice when the loudness is silently missing. (It's easier when it's loud, but still...)

masklinn · on Oct 10, 2023

> Otherwise "(*x).y" would have been needed, instead of "x->y". With a postfix "*", that would have been "x*.y", and "->", would not have been needed.

I mean technically the compiler could just have been less stupid and made `.` auto-deref, it’s not like C has operator overloading so the LHS is either a struct or a pointer, not both.

adrian_b · on Oct 10, 2023

It is not that simple, because in C you may chain a great number of prefix and postfix operators.

It is frequent to have multiple "*", "[]", "." and "->".

Now you have the simple rule of executing first the postfix operators from left to right, then the prefix operators from right to left.

If some special rules had to be invented to avoid writing the parentheses in "(*x).y", after adding several "*", "[]" and "." it would become impossible to understand the right evaluation order.

When all of "*", "[]" and "." are postfix, they are just executed in order from left to right, which is easy to understand.

While sometimes auto-dereferencing a pointer would be convenient, in C you want frequently to get the value of a link, or to do pointer arithmetic. With auto-dereferencing, some new ways of using an "address of" operator would have to be invented, which are unlikely to be more simple than the current rules.

Like I have said, the 2 languages that have introduced the concept of pointers were Euler, with explicit dereferencing, and CPL, with implicit auto-dereferencing.

Both are valid ways to design a programming language, but in both case a lot of rules must be added to cover all the use cases, which would have to be more complex in the languages with auto-dereferencing, so these languages usually solve this problem lazily, by just prohibiting some uses, e.g. by prohibiting pointer arithmetic.

tsimionescu · on Oct 10, 2023

I believe GP means that "a.b" should be enough for pointers as well, you don't need "a->b" or "(*a).b". It's not like pointers can have fields in C, so the meaning would have been unambiguous.

flohofwoe · on Oct 10, 2023

I really prefer that '.' and '->' remain separate, because it makes it immediately clear when a potentially expensive pointer derefence is happening:

    x = a->b->c->d->e;

...is a pointer-hunting-nightmare / potential cache-miss-galore.

    x = a.b.c.d.e;

...no problem, the whole chain is resolved at compile time into a single offset and results in at most one memory access.

    x = a.b.c->d.e;

...makes it immediately clear that c->d is a pointer access and everything else isn't.

PS: ...and of course C++ messed this simple rule up with the introduction of references.

nuancebydefault · on Oct 10, 2023

When stepping up from C to C++, it feels ultimately unlogical to use an ampersand to indicate a reference in a declaration, while in an assigment or actual parameter it means address of. Where they out of special chars?

adrian_b · on Oct 10, 2023

Sorry, I had posted a partial reply before finishing writing the complete one.

As I have written above, a language with implicit dereferencing needs additional syntactic means for denoting the cases when the values of the pointers are needed, e.g. for doing pointer arithmetic, which is frequent in C.

If dereferencing would have been made implicit, that would have required a large number of changes in the language. C++ has introduced pointers with implicit dereferencing, i.e. what C++ calls references, but due to the fact that the other syntax changes that are needed have not been made, because they would break compatibility, the C++ references can replace C pointers only in a subset of their uses.

tsimionescu · on Oct 10, 2023

No one here is proposing implicit dereferencing (everywhere), the proposal is just to make the syntax `pointer.field` have the exact same meaning that `pointer->field` has in C today. In all other places, the current syntax and semantics would stay the same.

Since in all situations where pointer->field is valid pointer.valid is an explicit error in C today, this would have been very much doable without major changes in compiler or language implementation.

Of course, at C's level of abstraction, and given the speed limitations of the day, the syntax difference between -> and . may well be argued to have been helpful instead of harmful.

And yes, the same could not be said for C++, where you can implement dereferencing for your own type and get a variable where both `a.b` and `a->b` are valid and have different meanings. This is anyway not a proposal for changing C (or C++) today, merely a "what if" discussion about how C could have been designed differently.

dataflow · on Oct 10, 2023

> If dereferencing would have been made implicit

You're misunderstanding the comment. They aren't asking for implicit derefs. They're asking for explicit derefs, using * or . instead of * or ->

psd1 · on Oct 10, 2023

Your and GP's make no sense unless the reader knows that * is a markdown directive and infers where you typed them.

Which is ironic when discussing overloading!

dataflow · on Oct 10, 2023

In confused, where in my comment does the Markdown syntax come into play? You're not seeing italics or backslashes are you?

psd1 · on Oct 10, 2023

I see now - Hacki renders italics, but the browser doesn't.

Excuse my confusion, this possibility didn't occur to me.

fanf2 · on Oct 10, 2023

Hacker News does not use Markdown.

masklinn · on Oct 10, 2023

While that’s technically true, it does use a severely cut down version thereof which is worse in every way: it only supports paragraphs, code blocks, and emphasis so the ability to format is extremely limited but the emphasis can still screw you over, which requires an explicit preview (or posting) to notice.

And I think escaping emphasis was only introduced somewhat recently? I do remember that for the longest time you basically had to trick HN into not breaking your comments by using a different character in stead.

OJFord · on Oct 10, 2023

No, but it does use:

for italics, which sometimes causes a comment to go haywire because someone uses it to reference a footnote or otherwise drops it in mid-sentence, with a matching closing one, not intending italics.

(Up-thread I think it probably was like that when they commented, but has since been fixed.)

wruza · on Oct 10, 2023

Btw, you can now(?) escape * in a paragraph with \*.

OJFord · on Oct 10, 2023

Yes, it's also automatic in some cases - originally my comment had an attempt at a joke example of it, but it was caught and didn't work. It was definitely improved relatively recently.

adrian_b · on Oct 10, 2023

With explicit derefs, if "*x.y" is taken to mean "(*x).y", then which is the meaning of "****x1.x2[7].x3.x4[9].x5"?

> "a.b" should be enough for pointers as well,

I have interpreted this to mean implicit deref, as there is no "*" (could be a formatting problem).

dataflow · on Oct 10, 2023

I think you're still misunderstanding the proposal. What are the types of your variables? This discussion makes no sense without type information. We're defining the meaning of . for pointers here. You seem to be misunderstanding the proposal as being purely a textual transformation that ignores types?

The proposal is literally: "If you see . and the left operand is a pointer, pretend the . was -> instead, because otherwise the code is already invalid."

adrian_b · on Oct 10, 2023

OK, I believe that you are right and if implicit dereferencing had been done only when this is the only interpretation that leads to a valid expression, then "->" would not have been necessary.

However, I assume that this would have been a too complex solution for compilers that had to work in a few tens of kilobytes of memory, while a postfix "*", as already used a decade before C, would have been a trivial solution.

masklinn · on Oct 10, 2023

> However, I assume that this would have been a too complex solution for compilers that had to work in a few tens of kilobytes of memory

It's literally the same complexity as the type checking compilers already do to tell you that your `.` does not work because the LHS is a pointer not a struct.

adrian_b · on Oct 11, 2023

It is not the same complexity, because syntax checking stops immediately at an error.

To determine if implicit dereferencing may be applied, more analysis has to be done, because there it may be not only a pointer to a structure, but a pointer to a pointer to a structure and so on, so multiple implicit dereferencing may be needed to obtain a valid expression.

However I agree that the difference in complexity is not big.

tsimionescu · on Oct 11, 2023

`a.b` is always valid syntax, there's no way to know if it is a valid C instruction until you resolve the types of a and b. Assigning it valid semantics at that point is exactly as easy as assigning it error semantics.

And no, this would not perform multiple levels of dereferencing anymore than -> does today. You could have literally find&replaced every use of -> with . and every C program would have had the exact same semantics. `struct point **a; a.x = 1` would throw the exact same compilation error that `struct point **a; a->x = 1` throws today. The only difference would be that `struct point *a; a.x = 1;` would write 1 to the field x of the object pointed to by a, instead of throwing an error that says "object of type struct point* has no field named x".

loup-vaillant · on Oct 10, 2023

> As I have written above, a language with implicit dereferencing needs additional syntactic means for denoting the cases when the values of the pointers are needed, e.g. for doing pointer arithmetic, which is frequent in C.

No it doesn’t. Let’s take this example in valid C:

  struct my_struct { int field };
  struct my_struct  s = { .field = 42 };
  struct my_struct *p = &s;
  printf("direct : %d", s.field);
  printf("pointer: %d", p->field);
  printf("address: %x", p);

What is being proposed here is to make that code valid:

  struct my_struct { int field };
  struct my_struct  s = { .field = 42 };
  struct my_struct *p = &s;
  printf("direct : %d", s.field);
  printf("pointer: %d", p.field); // note the use of a dot here
  printf("address: %x", p);

That is, the naked p is still to be interpreted as what it is: a pointer. It’s just that when we write `a.b`, the language would first check the type of `a`, then dereference it as many times as necessary to get to the underlying struct, and then access its field. For instance:

  struct my_struct { int field };
  struct my_struct    s   = { .field = 42 };
  struct my_struct   *p   = &s;
  struct my_struct  **pp  = &p;
  struct my_struct ***ppp = &pp;

Now let’s see how this automatic indirection would work:

  // All would print the same value
  printf("%d", ppp.field);
  printf("%d", pp .field);
  printf("%d", p  .field);
  printf("%d", s  .field);

  // We can still use explicit indirections
  printf("%d", (*ppp  ).field);
  printf("%d", (**ppp ).field);
  printf("%d", (***ppp).field);
  printf("%d", (*pp   ).field);
  printf("%d", (**pp  ).field);
  printf("%d", (*p    ).field);

We can still get to the actual addresses no problem:

  printf("p: %x", p);
  printf("p: %x", *pp);
  printf("p: %x", **ppp);

  printf("pp: %x", pp);
  printf("pp: %x", *ppp);

  printf("ppp: %x", ppp);

Note that we can play with the & operator too. In valid C we can do this already:

  printf("s.field: %d", s.field);
  printf("s.field: %d", (*&s).field);
  printf("s.field: %d", (**&&s).field);
  printf("s.field: %d", (***&&&s).field);

With automatic indirection the following would be valid too:

  printf("s.field: %d", (&s).field);
  printf("s.field: %d", (&&s).field);
  printf("s.field: %d", (&&&s).field);

---

The kicker here is that the decision on whether an access to a struct member requires dereferencing the pointer or not, is not done at parsing time. It’s done at type checking time. And by the way, in standard C the decision to give you an error or not is already done at type checking time. All this to say, this would be a fairly benign change to compilers.

Now would users get confused? Possibly. With the conflation of pointers and arrays, the following would be equivalent:

  array.field
  (*array).field
  array[0].field

Looks nifty to some perhaps, but some people really meant:

  array[i].field

and forgot to write the index.

renox · on Oct 11, 2023

The problem with postfix * for dereferencing is that it would look like a multiplication..

That's why I would have used a postfix @ for dereferencing.

Snarwin · on Oct 10, 2023

Interestingly, C already does this for function calls: "fptr()" is considered equivalent to "(*fptr)()".

kps · on Oct 10, 2023

A mistake, IMO; I want to be able to see right away whether a call target is static or dynamic.

hknapp · on Oct 10, 2023

https://stackoverflow.com/questions/13366083/why-does-the-ar...

bluecalm · on Oct 10, 2023

There is something to "explicit is better than implicit" mantra. I appreciate the C way because if I write a.b and it's an error it clarifies what's going on in my head (I now know a is a pointer not an object).

masklinn · on Oct 10, 2023

> There is something to "explicit is better than implicit" mantra.

I’ll entertain this when C fixes (or at least removes) integer promotion, which is a source of far more bugs and misunderstandings than this could ever be: `.` auto-deref-ing does not actively undermine what little type system C has.

sigsev_251 · on Oct 10, 2023

Integer promotion will probably never be removed from the fundamental types (char, signed char, short, etc.). You can use `_BitInt`(N) types in the future which don't follow the implicit integer promotion rule. There are at least two compilers today that implement it (clang and SDCC).

masklinn · on Oct 11, 2023

> Integer promotion will probably never be removed from the fundamental types

Oh let me assure you, I have no actual expectation that C would ever change in such a way.

Although it should be noted that both suggestions are entirely syntactic and could be gated behind a per-file (or even per function) stricture a la strict mode.

bluecalm · on Oct 10, 2023

Man, this is completely unrelated problem. Implicit operator overload is just a bad design. Integer promotion has advantages and disadvantages, especially in a language like C where you often use 8bit or 16bit types for memory optimization/alignment reasons.

masklinn · on Oct 10, 2023

> Man, this is completely unrelated problem.

You assert that you do not want this on grounds of clarity. It is not an unrelated problem to point out that C has numerous obscurity issues significantly worse than this could ever be in the language right now.

> Implicit operator overload is just a bad design.

That's at best a bunch of words arranged nonsensically, and at worst an assertion that you want to remove arithmetic operators from the language?

> Integer promotion has advantages and disadvantages, especially in a language like C where you often use 8bit or 16bit types for memory optimization/alignment reasons.

It's mostly a major actual source of obscurity and bugs.

bluecalm · on Oct 10, 2023

You are not arguing in good faith but just in case I will point out what you are missing:

1)Implicit operator overload is a bad design in case of pointers and objects because a.b and a->b are both readable, easy to type and short.

2)Integer promotion is not as simple because requiring explicit promotion makes a holy unreadable mess out of your code in the simplest of cases.

2) is demonstrated many times by mongo code lines in Java with all the explicit casts. This is also the reason Python has implicit promotion for its number type. You sacrifice something to get something unlike in 1) where you only lose for no gain what so over.

masklinn · on Oct 10, 2023

> You are not arguing in good faith

You're just not being honest.

> 1)Implicit operator overload is a bad design in case of pointers and objects because a.b and a->b are both readable, easy to type and short.

That's just a baseless assertion. Here I can do the same: implicit overloading is a great design in case of pointers and objects because a.b is always unambiguous and uniform, and -> is a harder to read and type extra operator which has no justification.

And then obviously your assertion can be used exactly the same way to similarly assert that every numeric type should have its own set of arithmetic operators, after all u4+ is also readable, easy to type, and short.

> 2)Integer promotion is not as simple because requiring explicit promotion makes a holy unreadable mess out of your code in the simplest of cases.

Integer promotion is much simpler because it's literally a never-ending source of bugs.

> This is also the reason Python has implicit promotion for its number type.

Python does not have implicit promotion for its number type, at best Python 2 had that for performance reasons (definitely not "mongo code lines with all the explicit casts" which it could not care less about), and in reality that was not even the case because it does not corrupt your data upfront as C does.

atilaneves · on Oct 10, 2023

Proof of the fact that this works is that D does exactly this.

masklinn · on Oct 11, 2023

AFAIK it works like that in most modern languages with pointers: Rust, Zig, even Go will implicitly deref pointers on attribute access.

WalterBright · on Oct 10, 2023

The thing is, the `->` operator in C is completely unnecessary. `.` can be used instead. The compiler can distinguish by looking to see if the lvalue is a pointer or a value.

The result makes it much easier to refactor code. Ever try replacing a value with a pointer in C? Arrghh.

paulddraper · on Oct 10, 2023

I think it is debatable on whether struct property access and pointer access should appear the same.

* Performance. Pointer access is an order of magnitude more expensive.

* Errors. Pointer access can fail (in particular, the NULL pointer).

_3u10 · on Oct 10, 2023

The stack is not some magical location in RAM that is faster, and in any case you can have a pointer to the stack.

It's super common to define a struct on the stack and then pass a pointer to an init function to initialize the value that is on the stack.

Sometimes pointers are faster because you don't have to copy the entire struct.

What largely determines the performance of values vs. pointers is whether that location in RAM is cached, and whether you need to allocate memory on the heap.

WalterBright · on Oct 10, 2023

It may be debatable, but in D we've used it for 20 years and problems with it have yet to crop up. And indeed, it makes refactoring code far easier.

paulddraper · on Oct 10, 2023

Not directly to syntax, but D null pointer safety is weird IMO.

It checks array access bounds, but segfaults for null pointers. Seems an odd choice, and there are multiple discussions of people saying the same.

WalterBright · on Oct 10, 2023

Null pointers are automatically checked by the hardware (seg fault). Adding another runtime check is redundant.

Yes, I know, adding a huge fixed offset to the pointer can push it past the null protected pages.

brundolf · on Oct 10, 2023

Rust makes them the same too (though its type system eliminates the fallibility issue)

guenthert · on Oct 11, 2023

Redundancy is a good thing. It helps humans to detect errors and they need all the help they can get.

wruza · on Oct 10, 2023

I wanted this (and few more dozens of things) most of the time when I still wanted to write C. Good times.

fanf2 · on Oct 10, 2023

The prefix * in C comes from prefix ! in BCPL. C’s weird a[b] == b[a] is also a hidden BCPLism.

BCPL has infix ! as well as prefix ! so you write an array index expression like array!index instead of array[index]. Infix ! is commutative like addition and [] in C.

You declare structures in BCPL by defining a constant for the offset of each member, so you can write object!MEMBER somewhat like C object->member. The semantics of -> in early C were very similar to BCPL infix ! except that members had types as well as offsets, but like BCPL there was nothing to tie a particular member to a particular structure as there is in modern C.

There’s a certain elegance to BCPL’s syntax that you don’t get from prefix-only or postfix-only indirection operators. C might have been better if it had stuck closer to BCPL in this respect, but sadly * conflicts with infix multiplication.

Some readers might also have used BCPL-style syntax for WIMP programming in BBC BASIC on RISC OS.

teo_zero · on Oct 11, 2023

> sadly * conflicts with infix multiplication.

I don't think it does. There's no ambiguity in parsing any expression that uses it in both se se.

It does conflict with division, though, because /* starts a comment!

alexeiz · on Oct 10, 2023

> The right way to generalize "for", i.e. with iterators, was introduced by Alphard in the same year with C, i.e. in 1974.) (Compare "for (I=0;I<N;I+=5) {" of C with "for I from 0 to N by 5 do" or "for I in 0:N:5 do" of previous languages.

Really? How do you do the C's equivalent of `for(i = 0; i < N && j < M; i++, j++)` with your `for I from 0 to N` syntax? The for loop in C is extremely flexible and it captures the idea of the loop perfectly: there is the initialization block, the exit condition and the iteration. In other languages at that time the for loop was written in terms of a range, but without a strong range abstraction such loop is really primitive and limited in applicability.

nyssos · on Oct 10, 2023

> In other languages at that time the for loop was written in terms of a range, but without a strong range abstraction such loop is really primitive and limited in applicability.

That's a good thing: it gives you a clear syntactic marker for "this loop definitely terminates". There's already a full power anything-goes looping construct: `while`.

professoretc · on Oct 10, 2023

But `while` doesn't restrict the scope of the variable(s) to just the loop. I believe there was a proposal to add

    while(int i = 0; i < 100) { ... }

as a valid syntax to C++ (which would restrict the scope of i to the loop), as you can already do

    if(int i = f(); i > 0) { ... }

but it didn't go anywhere.

a_e_k · on Oct 10, 2023

I'd just use a for with an empty update clause for that:

    for (int i = 0; i < 100; ) { ... }

kazinator · on Oct 10, 2023

Using the for(;;) loop in Awk, and the C preprocessor, in a project called cppawk, I created a loop facility that has multiple clauses of different types, that can combine in parallel or as Cartesian-product, and are programmer-definable.

https://www.kylheku.com/cgit/cppawk/tree/cppawk-iter.1

The above manual page includes an example of how to define an alpha_range clause that iterates over string ranges like alpha_range(var, "000", "999") or alpha_range(var, "AAA", "ZZZ").

There is a conditional clause if(...) which takes a condition and another clause as arguments. The iteration of the other clause is suspended while the condition is false.

At the shell prompt: add the values of the odd integers in the range 1 to 50.

  $ cppawk '
  > #include <iter.h>
  >
  > BEGIN {
  >   loop (range (i, 1, 50),
  >         if (i % 2 == 1, summing (sum, i)))
  >     ;
  >   print sum
  > }
  > '
  625

Though if is an Awk keyword, this isn't a problem because if is a clause, not an ordinary expression. Moreover, though the clause is defined by macros, none of them are called if; they have if embedded in their name.

MrJohz · on Oct 10, 2023

I interpreted that line to be two separate clauses: firstly, a note that the statement that the correct generalisation for for loops is iterators, which were introduced in Alphard; secondly, an unrelated point about how the C-style generalised for loop is more verbose for the simple case than existing syntaxes.

That is, the commenter was not asserting the supremacy of "from 0 to N" as a for-loop construct, but rather asserting the supremacy of iterators, and pointing out how poor the C-style syntax is even compared to its less-generalised cousins.

adrian_b · on Oct 10, 2023

Before C and Alphard, all the "for" loops had control variables that took all values of some arithmetic progression.

Even today, such simple loops with arithmetic progressions include an overwhelming majority of all "for" loops.

The C syntax made writing these simple "for" loops more difficult, by having to write a lot of redundant symbols, instead of writing the minimum number of separators. Also for reading, the redundant symbols obscure the meaningful text.

The C syntax allows the writing of "for" loops where the values of the control variables are not taken from a progression, but they are for instance the values of the links of a linked list.

This kind of "for" loops can be written in a simple way by using iterators, without complicating the syntax of the loops with simple arithmetic progressions.

In modern C++ there is no longer any case when you would want to use the C kind of "for", but the kinds of loops used by languages like C++ already existed in languages introduced at about the same time with C, e.g. Alphard and Clu.

adrian_b · on Oct 10, 2023

Your example would be "for i in o:N for j in o:M do".

Equivalent syntax was used in Fortran for writing cycles in a simpler way than in C already 20 years before C. ("do 10 i=0,N do 20 j=0,M")

The only case when the C "for" is useful is when the third operation is neither an addition nor a subtraction. The most frequent such case is when the operation is a link dereferencing, for accessing a linked list.

Such cases, for visiting all members of some non-array aggregate data, e.g. linked lists or trees, are solved more clearly with iterators.

ordu · on Oct 10, 2023

for-loop in C is still primitive and limited in applicability, while it makes the most frequent use-case much more difficult.

For example, it is normal practice to do some of initialization outside of a loop. It is normal to deal with finishing loop by using goto or if-branching after the loop (think of a search, that can be succesful or not). It is normal to ignore update counter part of loop and do updates in a body of the loop, because either updates are too big to squeese them between into for, or you need to do something between updating your counter and checking for the necessity of running another iteration.

With all this said I know 2 another approaches to the problem.

1. Common Lisp approach: create really flexible loop clause, allowing to represent any loop in a structural way. I'd recommend to look at cl-iterate package for CL, to see what happens to maniacs who had chosen this way.

2. A less ambitious approach defining some simple loops for most frequent cases (while, range iteration, iteration by iterator) and a loop for a general case looking like "loop { iteration }".

I personally prefer the second approach. I loved C-way, then I was a big fun of a Lisp-way, but now I believe that it is silly to create a whole new language just to write iteration, and a half-baked attempt to cover with for-loop more cases then just range iteration has more downsides than upsides.

f1shy · on Oct 10, 2023

What you are doing there is no for. Not in the mathematical sense of “for all values in a set”. The semantic of the word “for” should matter. If you want to iterate 2 variables i and j is much better to do it explicitly. It will be much more readable.

a_e_k · on Oct 10, 2023

Another good real-world one that I've done is something like:

    for (bit = 1; bit <= 128; bit <<= 1)

Personally, I like the flexibility.

adrian_b · on Oct 10, 2023

A better syntax for loops with geometric progressions like this would be obtained by modifying the syntax for writing arithmetic progressions (e.g. "a:b:c") by using a different separator symbol.

For example, if the separator for geometric progressions would be ":>", your loop example would become "for bit in 1:>128:>2". Another example of (non-ASCII) separator would be "for bit in 1⋮128⋮2"

kazinator · on Oct 10, 2023

Such syntax benefits from a comprehensive target language to compile into, in terms of which it can be understood, if necessary.

bxparks · on Oct 10, 2023

Heh, hopefully not too often. First I thought this loops 128 times. Then 7 times. Then 8 times. Then I thought "undefined behavior" if bit is a uint8_t. But then I thought, I don't have freaking clue because I can't keep all of C's implicit casting and arithmetic operation rules in my head.

d-lisp · on Oct 10, 2023

I have the feeling the generalized "For" is nice because it allows for multiple initializers in a compact manner, how does for I from 0 to N by 5 deals with this ?

e.g. for(i=0,j=N.length;i<j;i++)

bmicraft · on Oct 10, 2023

This should be equivalent:

   for i from 0 to N.length [by 1]

jandrese · on Oct 10, 2023

> that would have been "x*.y", and "->", would not have been needed.

Arguably that syntax would have been prone to a lot of unintended pointer multiplication due to syntax error.

loup-vaillant · on Oct 10, 2023

Or we could use a different operator for pointer indirection and multiplication? There’s also the pointer/array conflation that could cause some confusion.

lowbloodsugar · on Oct 10, 2023

>because it complicates the frequent use cases in order to simplify seldom encountered use cases.

Putting complex logic in the three places of a for loop is something we all frown on now because of readability and bugs, but back then it was almost more common than just iterating from 1 to N. Same for dropping through case statements - frowned on now (rightly) because gotcha bugs, but used heavily.

>Making "*" a prefix operator in B and C was a mistake

C dominates for the same reason it is terrible: it gets shit done for some value of "shit". In this light, there are no mistakes. There are other languages that don't make these "mistakes" and, behold, nobody wrote the majority of the world's operating systems and software in them. I've used C when the viable options were hand-crafted assembly or C. There are no mistakes here, only practicalities.

AnimalMuppet · on Oct 10, 2023

> Compare "for (I=0;I<N;I+=5) {" of C with "for I from 0 to N by 5 do" or "for I in 0:N:5 do" of previous languages. C requires typing a lot of redundant characters in the most frequent cases.

Um, "for (I=0;I<N;I+=5) {" is fewer characters than "for I from 0 to N by 5 do".

f1shy · on Oct 10, 2023

I do not remember the author of the quote: A program is written once and read multiple times. Optimize for reading, not writing.

The argument “is more keystrokes” in this context is beyound what i can comprehend.

adrian_b · on Oct 10, 2023

"for I from 0 to N by 5 do" is the verbose variant, which is typical for a language like PL/I or Algol 68.

The syntactically equivalent, but non-verbose, is "for I ∈ 0:N:5 ⟨", which beats easily C.

AnimalMuppet · on Oct 10, 2023

Sure. And the observant will note that I didn't say that C was shorter than that version. Even if you replace "∈" with "in", it's still shorter.

Still... when did that syntax come out? Was it in Algol 68, or PL/I? It seems a bit unfair to complain about C being verbose if it was shorter than anything else available at the time.

adrian_b · on Oct 10, 2023

"∈" was actually proposed for Alphard, at the same time with C.

When later implemented, including in an Algol 68 variant that inspired the UNIX Bourne shell, "∈" was replaced with "in", which could be written with ASCII.

Algol 68 could use either keyword pairs, like "do" and "od" (which became "do" and "done" in the UNIX Bourne shell) or, optionally, various kinds of parentheses in their place, for conciseness.

The notation "0:N:5" or "0:N", when the step is 1, for arithmetic progressions is ancient. Even Fortran used it, but with the mistake of using comma instead of colon, which introduced a syntactic ambiguity, because comma was also used for other purposes. Using colon started with Algol 60, but that one preferred keywords instead of symbols in the arithmetic progressions used inside "for", so it did not use the notation consistently.

Fortran in 1954 or the language of Heinz Rutishauser, in 1951 (the first one with a "for" statement), were already much more concise than C.

jandrese · on Oct 10, 2023

Do people have normally keyboards with a ∈ key?

loup-vaillant · on Oct 10, 2023

Just remove the ∈ and ⟨ characters, and make braces mandatory:

  for i 0:N:5 {
    // loop body
  }

By the way, the same should have been done with if and while: optional parentheses, mandatory braces:

  if a == b {
    // code
  }

  while idx < end {
    // code
  }

d-lisp · on Oct 10, 2023

It's possible that parsing a complex condition wrapped by parentheses was found more convenient.

loup-vaillant · on Oct 13, 2023

It’s not. Not for humans (the extra parenthesis is just useless clutter), and not for machines (I’ve written parsers, both alternatives are as easy to implement).

Vvector · on Oct 10, 2023

The most frequent case would be increment by 1, so:

for (I=0;I<N;I+=1)

versus

for I from 0 to N do

Reviving1514 · on Oct 11, 2023

Wow today I learned B was a programming language and that C was indeed supposed to be a replacement for that. How interesting. Thank you!

latexr · on Oct 10, 2023

The correct link is https://softwareengineering.stackexchange.com/a/273268/94821, which points directly to the answer.

floodle · on Oct 10, 2023

Thanks! I was confused by this

robertlagrant · on Oct 10, 2023

Hot take: @ might've been a good alternative.

tsegers · on Oct 10, 2023

I'd argue @ is more suitable for a dereference operation than an "address of" one. It functions that way in an email address too.

bsza · on Oct 10, 2023

Also, then "int @p" would read "integer at p" which is exactly what "int *p" means.

frou_dh · on Oct 10, 2023

That's precisely what Clojure did, where @foo is syntactic sugar for (deref foo)

amelius · on Oct 10, 2023

So address-of could be an "@" symbol, but upside down.

anticensor · on Oct 10, 2023

Or a spanish question mark ¿.

robertlagrant · on Oct 10, 2023

Ah yes - that's true.

JoeAltmaier · on Oct 10, 2023

Yup. But as I understand it @ wasn't available on the terminals used to create C. Likewise #. They were used for later features, once they could actually be typed. Happy outcome, that there were characters available for fresh features, since C used absolutely everything available on the original (ADM?) terminals.

Another curious thing: Unix came shortly after and was entirely typed with lower case. Because those same terminals had only one font (font ROMs were tiny back then, and expensive). It looked like UPPERCASE but in fact the keyboard produced lowercase. Once better terminals were used to edit the code it was noticed for the first time(?) that it's all be edited as lowercase.

Or so the story went, back in those days, when it was all new.

trealira · on Oct 10, 2023

That corroborates what I've read.

The DEC PDP-11 assembler used the @ symbol for dereferences of registers ("@Rn" or "(Rn)" dereferenced a register, for example).

However, in Unix, the terminal convention was to use the @ symbol to delete the current line, and they didn't have a DEC assembler yet, so they used the asterisk (*) instead in B, as well as in the Unix PDP-11 assembler, which was written in B.

I'm paraphrasing this Quora post (which lists primary and secondary sources): https://www.quora.com/Why-did-the-developers-of-C-decide-to-...

aidenn0 · on Oct 10, 2023

As far as I know @ on modern keyboards above the two from the IBM Selectric[1], released in 1961; the Model D and the VT-52 both copied the keyboard layout. The 1963 ASCII draft had @, despite not having lowercase yet.

Prior to IBM, it was on at least the Underwood typewriters[2], so it was never quite absent.

1: https://en.wikipedia.org/wiki/IBM_Selectric#/media/File:IBM_... Note that the number-row is only different from modern keyboards with the cent-sign being replaced with a caret (the cent-sign was never included in ASCII)

2: https://upload.wikimedia.org/wikipedia/commons/a/a4/Ernest_H...

kps · on Oct 10, 2023

1963 ASCII was upper-case only, and many early terminals (notably for Unix, including the Teletype 33) were upper case only. `stty lcase`, which maps upper-case input to lower case, existed early — https://www.tuhs.org/cgi-bin/utree.pl?file=V1/man/man2/stty....

jjgreen · on Oct 10, 2023

Instance-variable heavy Ruby code does look rather like someone has sneezed on the monitor, so I think it would have been a bad move, aesthetically ...

user3939382 · on Oct 10, 2023

Nethack or code? You decide

extraduder_ire · on Oct 11, 2023

At the very least, &/* are in the same place on many keyboards, I've been bitten by muscle memory many times going back and forth between us/uk/fr keyboard layouts.

For reference, it's above the ' and between the ;: and #~ keys on my current board, rather than on shift-2 (where " lives).

Maybe this would have stayed consistent worldwide, if it was used in c.

elurg · on Oct 10, 2023

Pascal uses this syntax.

Also "pointer to integer" is written as "^integer" which is better than "int".

Syntax also doesn't allow confusing stuff like like "int a, b;".

brandmeyer · on Oct 10, 2023

> "^integer" which is better than "int"

I agree that it reads a little better. But as a small-handed person, it is unfortunately much more uncomfortable to type.

sureglymop · on Oct 10, 2023

Honestly that fully depends on your locale and keyboard layout. So it really shouldn't influence programming language syntax design.

bmacho · on Oct 10, 2023

My computer has buttons, and characters printed on it. So I can't really change the distance between characters.

marssaxman · on Oct 10, 2023

Some dialects of BASIC used @ as a shortcut for the `VarPtr` operation.

mock-possum · on Oct 10, 2023

My immediate thought as well.

Where’s that value in memory? It’s @ the address.

jandrese · on Oct 10, 2023

I agree, big missed opportunity in C’s development.

pjmlp · on Oct 10, 2023

Which is what Pascal dialects use.

Dobbs · on Oct 10, 2023

I didn't think @ took on the "at" meaning until email came around.

johannes1234321 · on Oct 10, 2023

The oldest trace of @ being used to mean "at" can be traced to typeriters for commerce around 1880, where it was used as "5 apples @ 10 p" meaning "5 apples at 10 pence each."

At least that's what German Wikipedia Claims while the corresponding paragraph in english Wikipedia is short.

https://de.wikipedia.org/wiki/At-Zeichen

https://en.wikipedia.org/wiki/At_sign

jfk13 · on Oct 10, 2023

The formal name of the Unicode character U+0040 '@' is, in fact, COMMERCIAL AT.

https://en.wikipedia.org/wiki/At_sign#Unicode

Sanzig · on Oct 10, 2023

It's also the "a commercial" in Quebec French [1]. Despite lacking the accent on the A (à), the intention of the A is to stand in for the single letter French word "à" (at) [2].

[1] https://vitrinelinguistique.oqlf.gouv.qc.ca/fiche-gdt/fiche/... [2] https://www.noslangues-ourlanguages.gc.ca/fr/cles-de-la-reda...

d-lisp · on Oct 10, 2023

French wikipedia traces the usage of @ to signify "à" (at) back to Renaissance !

gattilorenz · on Oct 10, 2023

More recently (but still 1968, before email): "In ALGOL 68, the @ symbol is brief form of the at keyword; it is used to change the lower bound of an array. For example: arrayx[@88] refers to an array starting at index 88."

eesmith · on Oct 10, 2023

RFC 20 shows "@" defined as "commercial at" in 1969. https://www.rfc-editor.org/rfc/rfc20.html . That was two years before Tomlinson used it for email addressing.

EDIT: "Creation of Computer Input in an Expanded Character Set" at https://ejournals.bc.edu/index.php/ital/article/download/292... from 1968, p112, describes it simply as "at".

EDIT #2: "TRANSLATION FROM MONOTYPE TAPE TO GRADE 2 BRAILLE" at p83 of https://archive.org/details/researchbulletin05lesl/page/82/m... from 1964 describes that symbol as the '"at" sign'.

I think that is early enough to say that email did not influence the "at" meaning but rather the other way around.

aidenn0 · on Oct 10, 2023

There are claims it comes from the latin "ad" similar to how & comes from the latin "et" but the etymology is far less certain, with the french à being another possible source.

What I can't find is when it was first used in the US for indicating home-team for sports. E.g. [1] where there is vs. or @ depending on whether it is a home or away game. I suspect it's relatively modern, but not sure how far back it goes.

1: https://www.mlb.com/redsox/schedule/2023-09

tsimionescu · on Oct 10, 2023

What do you think it meant before that?

narag · on Oct 10, 2023

Arroba:

https://en.wikipedia.org/wiki/Arroba

syockit · on Oct 10, 2023

As for me, I was taught it meant "alias", used when introducing nickname together with real name.

aspenmayer · on Oct 10, 2023

Maybe it’s just me, but ampersand sounds nothing at all like address to my English-speaking American ear. Am I missing something?

pavlov · on Oct 10, 2023

Next they're going to tell us that "asterisk" sounds like "dereference".

That's just how they talked in 1972 in New Jersey where Bell Labs is located.

aspenmayer · on Oct 10, 2023

My stepdad is from NJ and pronounces toilet as tore-let. He’s a real character and now I don’t know if you’re serious or not and it’s hilarious to me as a non-NJ person who has visited and met his side of the family.

paulddraper · on Oct 10, 2023

We obviously can't take anyone seriously who says "wudder".

rpz · on Oct 10, 2023

  *
  star
  start
  start of

;)

pezo1919 · on Oct 10, 2023

lol768 · on Oct 10, 2023

An equally confused BrE speaker reporting in! Can confirm it's not just AmE.

Ylpertnodi · on Oct 10, 2023

I've never seen BrE and AmE used so like this - fantastic! Apart from on mobile...bre and ame they will become. Hopefully context will explain.

ilyt · on Oct 10, 2023

The fact C was made on crack

It's where the C comes from!

kunley · on Oct 10, 2023

Unlikely, but maybe your comment was.

Cthulhu_ · on Oct 10, 2023

These characters are from its predecessor, B, though.

Funnily enough, the guy that made B is also behind Go, where * and & are still a thing.

Izkata · on Oct 10, 2023

Well, maybe B is B because it was the Beta version of C

bogwog · on Oct 10, 2023

Same! Someone should post a video/audio recording of the pronunciation. Even trying to imitate different accents I'm familiar with, I can't find one where the two words sound even remotely similar lol

tragomaskhalos · on Oct 10, 2023

Thought that as well, but try to imagine someone bellowing the word over the incessant clacking of teletypes and it _almost_ works ...

fanf2 · on Oct 10, 2023

I guess the similarity is between and & addr.

paulddraper · on Oct 10, 2023

Depends.

Ampersand begins with æ.

Address can use æ or the reduced vowel ə.

AH-dress vs eh-DRESS

Tommstein · on Oct 10, 2023

It's not just you.

shiomiru · on Oct 10, 2023

Interesting, I've always thought it's & pronounced as Latin `et', which sounds like `at', so &x gives a pointer that "points at" x.

Also, the "near on the keyboard" idea doesn't make sense anyway because C was developed on a bit-paired tty33 where the two symbols are not that close to each other.

r87 · on Oct 10, 2023

Odin uses ^ because pointers are pointy.

elurg · on Oct 10, 2023

Also Pascal.

And C++/CLI uses ^ for "managed pointers" (pointer to .net objects) and % for "managed references" which means there are all-together 4 ways to declare various types of pointers which is super-fun.

paulddraper · on Oct 10, 2023

Asterisks are kinda pointy.

seydor · on Oct 10, 2023

"address of" "adresaf" "andresanf" "anpresand"

rpz · on Oct 10, 2023

:) we think similarly. I came here to post

  &
  and
  anddress

Findecanor · on Oct 10, 2023

> anddress

I will remember it as the "Ursula" operator from here on

IAmYourDensity · on Oct 10, 2023

I have a pet theory that '#' and '*' have such prominent roles in C because Thompson and Richie developed B at Bell Labs in 1969 when the first push button phones were appearing with '#' and '*' buttons.

codesnik · on Oct 10, 2023

TIL "ampersand" etymology. from the Oxford dictionary: "Origin: late 18th century: alteration of and per se and ‘& by itself is and ’, formerly chanted as an aid to learning the sign."

twoodfin · on Oct 10, 2023