C -- Augmented Version of C Programming Language
C -- Augmented Version of C Programming Language
Abstract
The augmented version of C programming language is presented. The language
was completed with a series of low-level and high-level facilities to enlarge
the language usage spectrum to various computing systems, operations, users.
The ambiguities and inconsistencies have been resolved by managing problematic
and undefined languages elements through an interpretation and management
similar to that used in the case of other C syntax based languages. The
proposed augmentative completeness elements, through @C approach, preserve the
spirit of C language and its basic characteristics through compatibility with
the standard version but also allow rejuvenation and bring C language to the
present programming languages state of the art.
The augmented version of C programming language is presented. The language was completed
with a series of low-level and high-
level facilities to enlarge the language usage spectrum to various computing systems, operations,
users. The ambiguities and
inconsistencies have been resolved by managing problematic and undefined languages elements
through an interpretation and
management similar to that used in the case of other C syntax based languages. The proposed
augmentative completeness elements,
through @C approach, preserve the spirit of C language and its basic characteristics through
compatibility with the standard version
but also allow rejuvenation and bring C language to the present programming languages state of
the art.
@C Characteristics
/*
This is a block
or multi-line
comment!
*/
The single-line comment starts with // if these characters are not placed in a string or a multi-line
comment and ends
at end of line.
3
Similar to other languages based on C syntax, single line comment continuation (which ends
with line-continuation
escape character \) is excluded in @C
The basic punctuators used in @C are the same as in C language, except that they are accepted
without
ambiguous combinations or in some combinations for which rules based on priorities are needed,
rules that are not
immediately obvious to the simple user. For example, the expressions like i+++j will be
considered ambiguous and
treated with a notification/warning or an error in order to guide the user to be more rigorous and
to write it as he
intended as i++ +j or i+ ++j. This approach will give to the codes a more lisible description, in
addition to the
aesthetic aspect, being also in accordance with good practices which suggest successive
operators separated by space
etc.
Regarding tokens and identifiers content, along with string literals, the @C language is flexible
because
facilitate the programmer descriptions of some elements in any natural language. In this respect,
it is useful to accept
identifiers described in different languages (in some dialects/compilers this is restricted)
because often for the common user the names are also semantic indications of some language
elements in the absence
of related comments (which can be accessible to any language) etc. The advantages of
restrictions (such as a stricter
control of errors etc.) must be balanced with the advantages of flexibility (such as an easier and
more concise use of the
language by the global programmers etc.).
Even if, for general cases, two types of control instructions are needed [8], in the initial and final
stages of the
compilation of C language it is enough to complete and make flexible the control instructions
from the preprocessing
stage and to be used even for output formation decisions, optimizations etc. In order to achieve
this, it is essential that
the preprocessing stage to be interspersed with the coding stage, this intercalation will not
essentially affect the
preprocessing of the previously macro descriptions. In other words, the specific elements of the
preprocessing stage
must be combined with the language elements, even if (for historical reasons) they were
introduced and managed
almost independently of the language itself. The C language was structured to be processed in
one logical compilation
step/pass through the explicit pre-declarative structure. Therefore, in @C approach is
establishing two-way correlations
between the preprocessing stage and the compilation/coding stage. For this purpose, on the one
hand the control
instructions are completed with other types of instructions (such as repetitive or cycling
instructions as #while) [8],
besides the standard ones #ifdef / #ifndef / #if / #else / #enidf and on the other hand, their
conditions are also allow the verification of some characteristics specific to the coding process
such as if a
parameter/variable has been declared , coded, used etc, in addition to the macro specific
verification such as
defined which will keep its preprocessing meaning (for compatibility), its counterpart for
pure/codable language
(non macro) identifiers being coded, obviously with the similar meaning. In this way, due to the
interaction and
alternations of the preprocessing operations with coding ones in @C, the preprocessing control
instructions can be used
in the coding stage, even in code optimization operations. For instance, if a declared procedure it
may not be used
(called / invoked / referred / assigned), then one may use a conditional inclusion/exclusion of its
forward
defining/coding area:
#if used P
void P(){/*..*/}
#endif
which is useful to eliminate the code/body of unused/uncalled procedure etc. These interactive
(macro and coding)
facilities are not incompatible with previous implementations of preprocessing operations that
were considered (and
implemented in the usual compilers) distinct from coding operations, but only make them more
flexible, through their
use even in the coding stage, the compilation processes and decisions becomes explicitly
accessible to language
descriptions without using specifications in other extra languages to perform certain compiling
operations etc.
Numeric literals are accepted within @C in various forms/bases for expressions compatibility
with other
languages. That including also the direct descriptions in base 2, neglected over the time by the C
standard, although
4extremely useful to have direct access to the bits of a numerical value/parameter, especially
from a low level
perspective etc. Also, in order ensure compatibility with other C dialects and for readability of
large numbers
descriptions, underscore character _ will be accepted for numeric literals fragmentation and
grouping as:
0b1_0000_0000 for 0b100000000 , 1_000_000 for 1000000 , 0xFF_FF for 0xFFFF etc.
One of the limits of the C language in the operating with binary parameters and operations was
the binary
limitation of zero-terminated strings. These limitations are removed in @C through consideration
of the declaration of
an array without the number of elements explicitly specified as being to a variable length array
managed as a dynamic
array
int A[];
compatible with the usual static array, but for which the number of elements can be determined
through length
internal function. In this way, @C facilitates easy management of binary-save strings (which can
contain \0)
with some limitations related to access to the definition/allocation of S for a proper use of sizeof
in the previous
example, limitations that do not exist if S string is passed to another variable and the content is
managed also as a
variable length array (dynamic array) to which length can be applied
In the case of usual C (except for some versions), the declaration of an uninitialized array
variable without the number
of elements is accepted in some particular cases and these corresponding to static array
declarations, which in @C are
used in their explicit form. For example, as last element in a struct in order to map as an array the
elements from the
post structure address, only that in the case of @C this facility is accepted only in the explicit and
correct form
struct S
{
//..
int L[0];
};
where L component is the correct and explicit definition for post struct data mapped with L field
with 0 space
allocate on struct for both @C and C. In this way must be declared and used a flexible array
member of a struct,
while member A from the following structure
struct S
{
//..
int A[];
};
will be managed as a dynamic array in @C, different from the C standard case for which the last
two descriptions are
equivalent, the last one being the inappropriate definition of a struct flexible array member.
There are a number of inconsistencies / ambiguities in how common C compilers evaluate
expressions,
ambiguities that are often inherited from early C compiler implementations and from the fact that
the expression
5concept has evolved over time, inconsistencies which are an important source of errors, side
effects,
undefined/unpredictable behaviors/results etc. These originate primarily from the compiler's
arbitrary unsequenced
implementation of statements and expressions, the problem that can be solved by assuming
sequencing and rule of
parsing from left to right etc. For example, the current reader can first try to estimate what will
display the following
code and then can test it on different compilers
most likely obtaining different values than estimated values and also other than the values
expected (estimated) by
common programmers. The correct and consistent implementation should print the values 3, 5,
identical to the values
obtained by similar codes described in other languages that inherit the C syntax (such as Java,
JavaScript etc.). The
present @C version of C languages solve even these ambiguities by more rigorously defining the
pre or post increment
and decrement operations, the evaluation order of the arguments of a procedure etc.
The sequence point ambiguity on the usual C expressions can be highlighted from the analysis of
the
following example
for which the most C compilers displays 1, 0 which is not 0, 1 as common programmers
expected to be. The
problem is inherited by compilers based on an erroneous implementation according to which the
evaluation of
arguments from right to left, as they are also saved on stack in the case of standard C, is faster
and this speed is more
important than the rule of logical parsing/processing from left/first to right/last of procedure
arguments. In fact, at that
time, right-to-left processing of arguments was a convenient choice implementation and not a
speed issue, the speed
was invoked to argue this unjustified deviation from the rule. Clear rules and restrictions on
expression evaluation order
are essential. Rules are more important (reduce errors) than a little bit faster but unpredictable
code generation (with a
few irrelevant percentage) related to some difficulties or inabilities in some particular systems
implementations. Over
time, the implementation was maintained for reasons of compatibility with previous
implementations and later
considered as an unspecified behavior seen as a freedom left to compilers to achieve faster
implementations etc. In
order to solve this problem, the default @C calling and evaluations of arguments is a new one
named ccall, in which
the arguments are evaluated from left to right and saved onto stack from right to left (as cdecl),
unlike the common C
case which use cdecl as the default procedure calling convention in which the arguments are also
evaluated from
right to left. These aspects can be easily highlighted through the following procedure
called with
which will print A-B with proposed @C ccall as P(0, 1) and B-A with the usual cdecl convention
as
P(1, 0). The evaluation from the last argument to the first ( cdecl) can often introduces errors that
are difficult to
6be identify/track by the common user, because the arguments are not processed naturally in the
order in which they
appear in the enumeration (call) list as the common programmer expected to be and also as it is
in other languages.
Similar to higher-level languages or other modern system languages, the @C language allows
the definition of
local procedures, in both (pure) nested ( N) and closure ( C) forms, as in the following example
ResultType Parent(int A)
{
int V = 0;
void C(){/*..*/}
//..
//ExternCVar = C;
//return C;
}
in which the volatile directive indicates that N is managed as nested procedure (safe to call only
until the parent
procedure return) and local procedures without volatile directive as C will be closure procedure
(safely exportable
and callable at any time with the contextual preservation of parent used environment this
meaning preservation of used
parent/ascendants arguments and local parent/ascendants variables). Along with the easy
management at the parent
block level of some local procedures, this facility will allow to use the language for higher level
operations,
asynchronous exchange of information through a save contextual callbacks etc. Along with some
high-level procedural
features, in @C, a series of related high-level data and operations are also accepted in order to
facilitate some concise
description, non-critical operations etc.
On the low level side, the asm directive allows in @C the use in an easier way of machine-type
descriptions in
addition to the assembly ones. Some preprocessing directives are correlated with the coding
directives also under the
low-level operations aspect. Other directives used in different versions or proposed (such as
#embed) are integrated
into easy resource descriptions related to some system formats etc. Also, in order to facilitate
connections between low-
level blocks, the definition and use of labels and related goto instruction will be made more
flexible, allowing easy
implementations of various low level optimizations such as recursive tail calling procedures etc.
The purpose of a programming language is to mediate the description of some operations from
the human to
the machine level through the fundamental translator within computing systems which is the
compiler [8]. In @C
language, the way in which the related compiler manages a series of operations is used explicitly
at the language level
and not just as some directives of the compiler as in the case of the conventional C language.
However, as today many
of the systems are interconnected, the @C language facilitates the description of some related
operations not only at the
library level but also at the internal and compiler level. For example, the main script server
functions must not be
exclusively specific to script languages but also to programming languages, in this sense, the @C
language and related
compiler also work as a (script) server by generalizing the source concept to accept some inputs
not only as files but
also as ports, addresses etc. The @C present augmented version of C language by including the
most representative low
and high level elements bring the language to the current programming languages state of the art,
in accordance with
the original language structure, in the spirit of C.
Conclusions
The proposed augmented version of the C language allows the flexibilisation and optimization of
the language
in a general way with facilities that keep the update compatible with the standard language
version.
The language include the low level elements that allow a more adequate management of
hardware components
and embedded systems but also include the representative high level languages elements that
allow an easier coding
from the user's perspective.
The ambiguities and inconsistencies have been resolved to make the language compatible with
languages that
use the same type of syntax by eliminating unpredictive or unexpected behaviors or results,
following to a greater
7extent the use of rules and the avoidance of exceptions, even if some exceptions to the rules
could generate a little bit
faster codes in some particular systems.
A series of elements have been introduced that will facilitate the direct compilation of codes,
allowing the
implementation of single tool compiler systems able of managing resources and generating
applications installation
archives, without other paralanguages, tools, dependencies, other auxiliary languages etc.
References
[1] D. M. Ritchie, The Development of the C Language , ACM 28, 201-208 (1993).
[2] D. M. Ritchie, K. Thompson, The UNIX Time-Sharing System , Bell System Technical
Journal 57, 1905-1929
(1978).
[3] B. Stroustrup, The C++ Programming Language , Addison-Wesley 2013.
[4] A. Hejlsberg, M. Torgersen, S. Wiltamuth, P. Golde, The C# Programming Language ,
Addison-Wesley 2008.
[5] W. Bright, A. Alexandrescu, M. Parker, Origins of the D programming language ,
Proceedings of the ACM on
Programming Languages 4, 1-38 (2020).
[6] J. Gosling, B. Joy, G. Steele, G. Bracha, The Java language specification , Addison-Wesley,
2000.
[7] B. Eich, C. R. McKinney, JavaScript Language Specification , Netscape Communications 2,
1996.
[8] I. I. Petrila, Implementation of general formal translators , arXiv:2212.08482 (2022).
[9] B. Heim, M. Soeken, S. Marshall, C. Granade, M. Roetteler, A. Geller, M. Troyer, K. Svore,
Quantum
programming languages , Nature Reviews Physics 2, 709-722 (2020).
[10] F. T. Chong, D. Franklin, M. Martonosi, Programming languages and compiler design for
realistic quantum
hardware , Nature 549, 180–187 (2017).
[11] D. Marković, A. Mizrahi, D. Querlioz, J. Grollier, Physics for neuromorphic computing ,
Nature Reviews Physics
2, 499-510 (2020).
[12] P. Stoewer, C. Schlieker, A. Schilling, C. Metzner, A. Maier, P. Krauss, Neural network
based successor
representations to form cognitive maps of space and language , Scientific Reports 12, 11233
(2022).
[13] S. Peta, C Programming Language Still Ruling the World , Global Journal of Computer
Science and Technology
22, 1-5 (2022).
[14] L. Beringer, A. W. Appel, Abstraction and subsumption in modular verification of C
programs , Formal Methods
in System Design 58, 322-345 (2021).
[15] S. Natarajan, D. Broman, Timed C: An extension to the C programming language for real-
time systems , IEEE
Real-Time and Embedded Technology and Applications Symposium 24, 227-239 (2018).
[16] S. H. Park, R. Pai, T. Melham, A Formal CHERI-C Semantics for Verification ,
arXiv:2211.07511 (2022).
[17] J. Dumas, H. P. Charles, K. Mambu, M. Kooli, Dynamic compilation for transprecision
applications on
heterogeneous platform , Journal of Low Power Electronics and Applications 11, 28 (2021).
[18] R. Chatley, A. Donaldson, A. Mycroft, The next 7000 programming languages , Computing
and Software Science,
250-282 (2019).
[19] C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T.
Shpeisman, N. Vasilache, O.
Zinenko, MLIR: A compiler infrastructure for the end of Moore's law , arXiv:2002.11054
(2020).
[20] J. Chen, J. Patra, M. Pradel, Y. Xiong, H. Zhang, D. Hao, L. Zhang, A survey of compiler
testing , ACM Computing
Surveys 53, 1-36 (2020).
[21] Z. Zhou, Z. Ren, G. Gao, H. Jiang, An empirical study of optimization bugs in GCC and
LLVM , Journal of Systems
and Software 174, 110884 (2021).