-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbasics.tex
976 lines (829 loc) · 42.1 KB
/
basics.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
\chapter{Basics of C}
There are certain rules in every language, certain grammar which
dictates the
way language will be spoken and written. It has a script to write
using. Similarly, programming languages have BNF (Backus-Naur Form)
context-free grammar. There are valid characters in a programming language and
a set of keywords. There are constructs to handle control flow, loops
etc. There are facilities provided by language to deal with numbers and strings
separately, to reuse the code and some basic data structures to facilitate
programming. However, programming language rule-set is very small compared
to a natural programming language. Also, when using natural programming
language like talking to someone or writing something the other person can
understand your intent but in programming you cannot violate rules. The grammar
is context-free. Compilers or interpreters cannot deduce your intent by reading
code. They are not intelligent. You make a mistake and it will refuse to listen
to you no matter what you do. Therefore, it is very essential to understand
these rules very clearly and correctly.
\section{The C Character Set}
The following form the C character set you are allowed to use in it which is
given in \S(iso.5.2.1):
\begin{Verbatim}[frame=single]
[a-z] [A-Z] [0-9] ~ ! # % ^ & * ( ) - = [ ] \ ; ' , . / _ + { } | : " < > ?
\end{Verbatim}
\index{character set}
This means along with other symbols you can use all English alphabets (both
uppercase and lowercase) and Arabic numerals. Symbols like \texttt{\$} and
\texttt{@} are not part of C's character set. But strings can contain any
these characters also. Strings are sequence of characters with double quotes
and double quotes itself are escaped with \texttt{$\backslash$}. Also,
\texttt{\$} and \texttt{@} can also be value of characters. Characters are
values containing single characters withing single quotes. We will see more of
these in their individual sections. However, English is not the only
spoken language in the world. Therefore in other non-English speaking counties
there are keyboard where certain characters present in above set are not
present. The inventors of C were wise enough to envision this and provide the
facility in form of trigraph sequences. Given below is the trigraph sequence
table given in \S(iso.5.2.1.1):
\begin{table}[H]
\begin{center}
\caption{Trigraph Sequences}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
\textbf{Trigraph}&\textbf{Equivalent}&\textbf{Trigraph}&\textbf{Equivalent}&\textbf{Trigraph}&\textbf{Equivalent}\\
\hline
??=&\#&??'&\textasciicircum&??!&|\\
\hline
??(&[&??)&]&??$<$&\{\\
\hline
??$>$&\}&??/&\textbackslash&??-&\textasciitilde\\
\hline
\end{tabular}
\end{center}
\end{table}
\index{trigraph sequences}
However, you should refrain from using trigraph sequences for portability
reasons as suggested by GNU coding standards.
\section{Keywords}
The following given in \S(iso.6.4.1) are reserved keywords for C programming language which you are not
allows to use other than what they are meant for:
\index{keywords}
\begin{table}[H]
\begin{center}
\caption{Keywords of C}
\begin{tabular}{l l l l l}
auto & break & case & char & const\\
continue & default & do & double & else\\
enum & extern & float & for & goto\\
if & inline & int & long & register\\
restrict & return & short & signed & sizeof\\
static & struct & switch & typedef & union\\
unsigned & void & volatile & while & \_Alignas\\
\_Alignof& \_Atomic & \_Bool & \_Complex & \_Generic \\
\_Imaginary & \_Noreturn & \_Static\_assert & \_Thread\_local\\
\end{tabular}
\end{center}
\end{table}
These keywords serve specific purpose. You will come to know about all of them
as you progress through the book.
\section{Identifiers}
The names which we give to our variables are known as identifiers
\S(iso.6.4.2). Please read this section carefully and make sure understand
the rules for naming identifiers. Later at the end of chapter there are some
simple problems to practice with.
Identifiers are something with which we identify the variables or constants or
functions, a tag or a member of a structure, union, or
enumeration; a typedef name; a label name; a macro name; or a macro
parameter. We will see all of them as we progress. In other words
since memory locations are difficult to remember for us, human beings, we
assign these memory locations more meaningful names in form of identifiers. As
you have already seen what is allowed in C's character set but not all are
allowed in an identifiers name. Only alphabets from English language both
lowercase and uppercase(they are distinct),
Arabic digits from zero to nine and underscore (\_) are allowed in an
identifiers name. The rule for constructing names is that among the allowed
characters it can only begin with only English alphabets and
underscore. Numbers must not be first character. For example, \texttt{x,
\_myVar, varX} and \texttt{yourId78} are all valid names. However, take care
with names starting from underscore as they are mostly used by different
library authors. Invalid identifier examples are \texttt{9x, my\$} and
\texttt{your age}. If the identifier name contains extended
characters(i.e. other than what is mentioned for simplicity like, Chinese,
European, Japanese etc) then it will be replaced with an encoding of universal
character set, however, it cannot be first character.
Length of an identifer for 31 characters, as specified in \S(iso.5.2.4.1), is
guaranteed across all platforms.
\section{Programming}
Let us see our first program and try to understand what it does.
\begin{minted}[frame=single]{c}
// My first program
/* Description: This program does nothing.*/
#include <stdio.h>
int main(int argc, char* argv[])
{
return 0;
}
\end{minted}
You can now issue a command as \texttt{\$gcc nothing.c} where
\texttt{nothing.c} is the filename by which you saved the source code. Note
that \texttt{\$} is the prompt not part of command itself. Then you can do an
ls and you will find that \texttt{a.out} is a file which has been produced by
gcc. Now you can run this program by saying \texttt{\$./a.out} and nothing
will happen. But if you type \texttt{\$echo \$?} then you will find that 0 is
printed on screen which is nothing but 0 after \texttt{return} of our program.
As you can see this program does almost nothing but it is fairly complete
program and we can learn a lot from it about C. Let us try to dissect it line
by line. The first line is a comment.
Whenever C compiler parses C programs and it encounters \texttt{//} it ignores
rest of line as code i.e. it does not compile them. This type of single line
comment were introduced in C99 standard and if your compiler is really old the
compiler may give you error message about it. The second line is
also comments. Anything between \texttt{/*} and \texttt{*/} is ignored like
\texttt{//}. However, be careful of something like \texttt{/* some comment */
more comment */}. Such comments will produce error messages and your program
will fail to compile. The reason for this is when first \texttt{*/} is
encountered by parser or compiler it will complete its token for the comment
and then further portion which we intended to be part of comment will cause
syntax error.
Comments are very integral part of programming. They are used to describe
various things. You can write whatever you want. They may also be used to
generate documentation with tools like doxygen. Typically comments should tell
what the program is doing not how. Sometimes how can be covered, when the logic
is really complex. One should be generous while commenting the code.
The next line is \texttt{\#include <stdio.h>}. \texttt{\#include} is a
preprocessor directive. The preprocessor directive is handled by the C
preprocessor which is handled by C preprocessor which looks in four directories
for include files. The include filename comes after \texttt{\#include} either in
angular brackets or double quotes. The C preprocessor looks for these at four
different places at least out of which one or possibly two is of interest for
now as we are dealing with angular brackets. Depending on the way your compiler
is installed the file \texttt{stdio.h} may be in \texttt{/usr/include} or
\texttt{/usr/local/include} but then again it may be in a non-standard path
also although possibility of that is very less and then it is controlled by
parameters whose discussion is beyond the scope of book. Let us say
\texttt{stdio.h} is present in either of aforementioned directories then the C
preprocessor will copies the contents and pastes them in source file along the
way putting \texttt{\#line} macros which are used for debugging
purposes. \texttt{\#line} macro is discussed later in the chapter which deals
with macros. You can see the output of C preprocessor by typing \texttt{\$gcc
-E nothing.c} since it will scroll a lot on you terminal you can use a pager
like \texttt{less} to read it. The \texttt{-E} tells \texttt{gcc} to just allow
preprocessing and not compile and link the file.
Next line is \texttt{int main(int argc, char* argv[])}. Now this is very special
function. Every complete executable(shared objects or dlls or archive
libraririe do not have main even though they are C programs) C program will
have one main function unless you do assembly hacking. This function is where
the programs start. The first word \texttt{int} is a keyword which shorthand
for integer. This signifies the return type of function. \texttt{main} is the
name of the function. Inside parenthesis you see \texttt{int argc} which tells
how many arguments were passed to program and is short form of argument
count. While \texttt{char* argv[]} is a pointer to array which we will see
later. For now let us just remember that it holds all the arguments to the
program including the program name.
Next is a brace. The scope in C is determined by braces. Something outside any
brace has global scope (we will see these later), something inside first level
of brace has function or local scope. Something inside second or more level of
braces have got that particular block scope. Scope here means that when there
will be a closing brace that particular variable which is valid in that scope
will cease to exist. However, we do not have to worry about that yet as we do
not have any variable. Just note that a corresponding closing brace will be the
end of main function. For every opening brace which starts a scope a closing
brace is mandatory.
Next line is \texttt{return 0;} This means whoever has called \texttt{main()}
will get a 0 as \texttt{return} is returning 0. In this case, receiver is the
shell or operating system
which has invoked the very program. The semicolon is called the terminator and
used also on Java or C++ for example. The very requirement of semicolon is to
terminate the statement and move on to next statement.
However, the program shown does not do much. Let us write a program which has
some more functionality and we can explore more of C. So here is a program
which takes two integers as input from users and presents their sum as
output. Here is the program:
\begin{minted}[frame=single]{c}
// My second program
// Author: Shiv S. Dayal
// Description: It adds two numbers
#include <stdio.h>
int main()
{
int x=0, y=0, sum=0;
printf("Please enter an integer:\n");
scanf("%d", &x);
printf("Please enter another integer:\n");
scanf("%d", &y);
sum = x + y;
printf("%d + %d = %d\n", x, y, sum);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{shiv@shiv:\textasciitilde/book/code\$ ./addition\\
Please enter an integer:\\
\textbf{7}\\
Please enter another integer:\\
\textbf{8}\\
7 + 8 = 15}\\\\
Note that \texttt{shiv@shiv:~/book/code\$} is the prompt.
Let us discuss new lines one by one. The line \texttt{int x=0, y=0, z=0;} is
declaration and definition or initialization of three ints. \texttt{int}
keyword in C is used to represent integers. Now we have three integers with
there values set to 0. Note that how the variables are separated by commas and
terminated by semicolon(as we saw in last program also). We could have also
written it like this:
\begin{minted}[frame=single]{c}
int x;
int y;
int z;
x = 0;
y = 0;
z = 0;
\end{minted}
or
\begin{minted}[frame=single]{c}
int x, y, z;
x = y = z = 0;
\end{minted}
However, the first method is best and most preferred as it prevents use before
definition. \texttt{int} is a data-type in C. \texttt{x, y,} and \texttt{z} are
called variables of type \texttt{int}. This means that the size of these
variables will be same as \texttt{int}. Note that
C is a statically typed language and all types have predefined memory
requirements. In our case, \texttt{int} requires 4 bytes on 32-bit and 64-bit
systems but 2 bytes on 16-bit systems.
Let us learn a bit about \texttt{printf}. This function is declared in
stdio.h. The prototype of \texttt{printf()} is
\begin{minted}[frame=single]{c}
int printf(const char *restrict format, ...);
\end{minted}
The first argument format is what we have in first two function calls. The
second is a \texttt{...} which means it can take variable number of arguments
known as variable-list. We have seen this in the third call.This means it will
take a string with optional variable no. of arguments. The string is called the
format-string and determines what can be printed with supplied arguments. These
\texttt{...} are used to supply variable no. of arguments. In the first two
\texttt{printf()} statements we just print the format-string so that is
simple. However, in the last one, we have format as \texttt{\%d} which
signifies a decimal integer. The integers printed are in the same order in
which they were supplied.
\texttt{scanf()} is scan function which scans for keyboard input. As by now you
know that \texttt{\%d} is for decimal integer but we have not said \texttt{x}
or \texttt{y}. The reason is \texttt{x} and \texttt{y} are names for memory
addresses while
\texttt{\&x} and \texttt{\&y} are the addresses of \texttt{x} and \texttt{y} in
memory. \texttt{scanf()} needs the memory address to which it can write the
contents to. You will see \texttt{\&} operator in action later when we deal
with pointers. Just remember for now that to use a simple variable with
\texttt{scanf()} requires \texttt{\&} before its name.
Till now we have just seen only \texttt{int} data-type but then there are more
data types for other types of numbers, characters and strings. Let us see them
one by one.
\section{Data Types}
What are data types? Why C needs data types? C is a statically typed language
that is every variable has a type associated with it. Types are discussed in
specification in great length in \S(iso.6.2.5) to \S(iso.6.2.8).
These types determine
what kind of values these variables can hold and how they will be interpreted.
Say we encode
character `A' for 10101 will it be easy for you to see A or numbers. Also,
numbers range from $-\infty$ to $\infty$. Also, since C is statically typed the
sizes of data types have to be known at compile time. Because the data type is
known a compiler can detect whether we are storing correct type of values in
correct type of variables at compilation time. Also, it allows compiler to do
certain optimization which effects the runtime performance of the program
during execution. There are four types of
data types. Integral, floating-point, arrays and pointers. Here, I will deal
with the two former types and leave latter two for later. The integral types
are \texttt{char, short int, int, long} and \texttt{long long} and
floating-point types are \texttt{float, double} and \texttt{long
double}. \texttt{signed} and \texttt{unsigned} are sign modifiers which also
modified the range of data types but do not affect their memory
requirements. By default all basic data types are \texttt{signed} in nature and
you must qualify you variables with \texttt{unsigned} if you want that
behavior. \texttt{short} and \texttt{long} are modifiers for size which the
data type occupies but I consider them as different types because memory
requirements are different. The ranges of integral data types directly reflect
their memory requirements and if you know how much memory they are going to
occupy you can easily compute their ranges. The range of floating-point comes
from IEEE specification. IEEE standard document 754 governs the binary
representation of floating point numbers which you can read at
\url{http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF}. You can
also buy it from IEEE's website. I will describe it later.
Let us write a program to find out ranges for integral data types:
\begin{minted}[frame=single]{c}
// Description: It gives ranges of integral data types
#include <stdio.h>
#include <limits.h>
int main()
{
printf("Size of char is..........%d\n", sizeof(char));
printf("Size of short int is.....%d\n", sizeof(short int));
printf("Size of int is...........%d\n", sizeof(int));
printf("Size of long is..........%d\n", sizeof(long));
printf("Size of long long is.....%d\n", sizeof(long long));
printf("Size of float is.........%d\n", sizeof(float));
printf("Size of double is........%d\n", sizeof(double));
printf("Size of long double is...%d\n", sizeof(long double));c
return 0;
}
\end{minted}
Here \texttt{sizeof} is a compile time operator which computes size of any type
passed to it as an argument. So it is computing sizes of all the data types as
shown in the program. The output is given below:
\\\\\texttt{Size of char is..........1\\
Size of short int is.....2\\
Size of int is...........4\\
Size of long is..........8\\
Size of long long is.....8\\
Size of float is.........4\\
Size of double is........8\\
Size of long double is...16\\\\}
Please note that the output shown is on 64-bit machine and it will be different
on 32-bit machines.
\section{Integers}
Integers are probably simplest to understand of all data types in C so I am
discussing them before any other type. As you have seen the keyword for
declaring integer type is \texttt{int}. An integer can be 2 bytes or 4 bytes. A
16-bit compiler will have integer of 2 bytes while a 32-bit or 64-bit compiler
will have a 4 byte integer. The specified minimum size of an integer is 2
bytes. Since most modern computers are either 32-bit with
64-bit becoming more dominant we will assume in this book that integer's size
is 4 bytes or 32-bit implicitly because 32-bit \texttt{gcc} gives a 32-bit
integer. There is a keyword \texttt{signed} which when
applied to a data type splits the range into two parts. Since integer is 32
bit so it will be split in the range from $-2^{31}$ to $2^{31} - 1$. By default
integers, characters and long are \texttt{signed}. Floats and doubles are
always \texttt{signed} and have no unsigned counterpart. When the integer will
be \texttt{unsigned} then the positive range doubles and it becomes $0$ to
$2^{32} - 1$. When the value of intger is more than its range then the values
rotate in the using modulus with the largest value of the range which is also
known as \texttt{INT\_MAX} or \texttt{INT\_MIN}. For \texttt{unsigned} types it
is \texttt{UINT\_MAX}. These are macros and are defined in \texttt{limits.h}
which you can find in \texttt{/usr/include} or \texttt{/usr/local/include} by
default.
There are four different types of integers based on their storage
requirement. \texttt{short int, int, long,} and \texttt{long long.} Short
integers are always two bytes. Signed short integer has a range of -32768 to
32767 while unsigned of that has a range of 0 to 65535. Plain integers
i.e. \texttt{int} has already been discussed. \texttt{long} are having a
minimum storage requirement of 4 bytes. Usually it is large enough to represent
all memory addresses of the system because \texttt{size\_t} is
\texttt{unsigned long.}
\texttt{short, long} and \texttt{long long} qualifiers decrease/increase the
range of plain integers. On a 64-bit compiler {short int} will be 2 bytes while
\texttt{long int} will be 8 bytes, which, will be equal to \texttt{long long
int}. \texttt{unsigned long int} is chose in such a way that it should be
capable of representing all memory addresses because it has a typedef to
\texttt{size\_t} which is the type of argument received by many functions
including memory allocation functions.
\section{Characters}
A \texttt{char} is 1 byte i.e. 8 bits or \texttt{CHAR\_BIT} bits. So its signed
version i.e. 2's
complement where half the range is negative and half is positive will have
value from -128 to 127. Well that is not exactly opposite because we have only
one zero for positive and negative numbers. If it would have been 1's
complement then range would have been from -127 to 127 but since computers
follow 2's complement the specification clearly mentions that range should be
from $-2^7$ to $2^7 - 1.$ Note that chars are fundamentally integral types and
ASCII symbols are first 128 numbers or in other words they are 7-bit numbers.
So a character `0' is internally 48 in decimal which is its integral or
internally it is handled as a sequence of binary numbers representing
\texttt{0x30} in hexadecimal. These integral values for characters are known as
ASCII value. A full table of ASCII values is given in the appendix A.
A simple program which takes input for few characters and then prints them on
console along with their ascii values is given below:
\begin{minted}[frame=single]{c}
#include <stdio.h>
int main()
{
char c = 0;
char c1 = 0, c2 = 0;
printf("Enter a character on your keyboard and then press ENTER:\n");
scanf("%c", &c);
printf("The character entered is %c and its ASCII value is %d.\n", c, c);
// Their remains '\n' in the stdin stream which needs to be cleared.
getchar();
printf("Enter a pair of characters on your keyboard and then press \
ENTER:\n");
scanf("%c%c", &c1, &c2);
printf("The characters entered are %c and %c and their ASCII \
values are %d and %d respectively.\n", c1, c2, c1, c2);
short int si = 0;
si = c1 + c2;
printf("The sum of c1 and c2 as integers is %hd.\n", si);
return 0;
}
\end{minted}
A sample run may have following output:
\\\\\texttt{Enter a character on your keyboard and then press ENTER:\\
\textbf{1}\\
The character entered is 1 and its ASCII value is 49.\\
Enter a pair of characters on your keyboard and then press ENTER:\\
\textbf{12}\\
The characters entered are 1 and 2 and their ASCII values are 49 and 50\\
respectively.\\
The sum of c1 and c2 as integers is 99.\\\\}
As you can see from the program that characters are internally stored as
integers and we can even perform integers which we normally perform on
numbers like addition as shown. We can perform other operation as subtraction,
multiplication and division, however, most of the time addition or subtraction
only makes sense to advance the characters in their class. Multiplication and
division of characters with other characters or integers does not make sense.
One problem of concern is the extra \texttt{\textbackslash n} in the input
stream. It does not cause trouble with integers but when you want to read
characters then the \texttt{Enter} or \texttt{Return} keys which may be left
over from the last input will cause trouble. \texttt{\textbackslash n} is
recognized as a character and will be assigned to next variable if it is in
\texttt{stdin.} One of the ways to remove it is to make a call to
\texttt{getchar} which reads one character from the \texttt{stdin} stream.
\section{Sizes of Integer Types}
Before going any further and discussing floats and doubles let us take a look
at limits of various integral type as specified in specification. Note that the
output from your compiler may be larger but not smaller because these are
minimum values. The limits given below and the limits which will be given for
floating point numbers are described in \texttt{<limits.h>} and
\texttt{<float.h>}.
\begin{itemize}
\item[---] number of bits for smallest object that is not a bit-field (byte)\\
\texttt{CHAR\_BIT}\hfil\hspace*{2cm}\texttt{8}
\item[---] minimum value for an object of type \texttt{signed char}\\
\texttt{SCHAR\_MIN}\hfil\hspace*{2cm}\texttt{-127 //} $-(2^7 - 1)$
\item[---] maximum value for an object of type \texttt{signed char}\\
\texttt{SCHAR\_MAX}\hfil\hspace*{2cm}\texttt{+127 //} $2^7 - 1$
\item[---] maximum value for an object of type \texttt{unsigned char}\\
\texttt{UCHAR\_MAX}\hfil\hspace*{2cm}\texttt{255 //} $2^8 - 1$
\item[---] minimum value for an object of type \texttt{char}\\
\texttt{CHAR\_MIN}\hfil\hspace*{2cm}\textit{see below}
\item[---] maximum value for an object of type \texttt{char}\\
\texttt{CHAR\_MAX}\hfil\hspace*{2cm}\textit{see below}
\item[---] maximum number of bytes in a multibyte character, for any supported
locale\\
\texttt{MB\_LEN\_MAX}\hfil\hspace*{2cm}\texttt{1}
\item[---] minimum value for an object of type \texttt{short int}\\
\texttt{SHRT\_MIN}\hfil\hspace*{2cm}\texttt{-32767 //} $-(2^{15} - 1)$
\item[---] maximum value for an object of type \texttt{short int}\\
\texttt{SHRT\_MAX}\hfil\hspace*{2cm}\texttt{+32767 //} $2^{15} - 1$
\item[---] maximum value for an object of type \texttt{unsigned short int}\\
\texttt{USHRT\_MAX}\hfil\hspace*{2cm}\texttt{65535 //} $2^{16} - 1$
\item[---] minimum value for an object of type \texttt{int}\\
\texttt{INT\_MIN}\hfil\hspace*{2cm}\texttt{-32767 //} $-(2^{15} - 1)$
\item[---] maximum value for an object of type \texttt{int}\\
\texttt{INT\_MAX}\hfil\hspace*{2cm}\texttt{+32767 //} $2^{15} - 1$
\item[---] maximum value for an object of type \texttt{unsigned int}\\
\texttt{UINT\_MAX}\hfil\hspace*{2cm}\texttt{65535 //} $2^{16} - 1$
\item[---] minimum value for an object of type \texttt{long int}\\
\texttt{LONG\_MIN}\hfil\hspace*{2cm}\texttt{-2147483647 //} $-(2^{31} - 1)$
\item[---] maximum value for an object of type \texttt{long int}\\
\texttt{LONG\_MAX}\hfil\hspace*{2cm}\texttt{+2147483647 //} $2^{31} - 1$
\item[---] maximum value for an object of type \texttt{unsigned long int}\\
\texttt{ULONG\_MAX}\hfil\hspace*{2cm}\texttt{4294967295 //} $2^{32} - 1$
\item[---] minimum value for an object of type \texttt{long long int}\\
\texttt{LLONG\_MIN}\hfil\hspace*{2cm}\texttt{-9223372036854775807 //} $-(2^{63} - 1)$
\item[---] maximum value for an object of type \texttt{long long int}\\
\texttt{LLONG\_MAX}\hfil\hspace*{2cm}\texttt{+9223372036854775807 //} $2^{63} - 1$
\item[---] maximum value for an object of type \texttt{unsigned long long int}\\
\texttt{ULLONG\_MAX}\hfil\hspace*{2cm}\texttt{18446744073709551615 //} $2^{64} - 1$
\end{itemize}
If the value of an object of type \texttt{char} is treated as a signed integer
when used in an expression, the value of \texttt{CHAR\_MIN} shall be the same
as that of \texttt{SCHAR\_MIN} and the value of \texttt{CHAR\_MAX} shall be the
same as that of \texttt{SCHAR\_MAX}. Otherwise, the value of \texttt{CHAR\_MIN}
shall be 0 and the value of \texttt{CHAR\_MAX} shall be the same as that of
\texttt{UCHAR\_MAX}. The value \texttt{UCHAR\_MAX} shall equal $2^{CHAR\_BIT} -
1$.
Values given above have the form of 1's complement in which positive zero and
negative zero are treated differently while computers in general work using 2's
complement so you will notice that the minimum values are extended by 1. So for
example, \texttt{SHRT\_MIN} in \texttt{<limits.h>} is given as -32768 and so
are all minimum values.
\section{Floating Types}
Floating point representation is a lot more complicated in computers than it
is for us human beings. C specification takes floating points description and
specification from IEC 60559:1989 which is a standard for floating point
arithmetic which is same as IEEE 754. In C there are three types of floating
point numbers \texttt{float, double} and \texttt{long double.} It is described
in specification in \S(iso.5.2.4.2.2).
A floating-point number is used to represent real-world fractional value which
is a trade-off between range and accuracy because as I said in \ref{fractional
binary numbers}, a decimal fraction cannot represented in binary unless the
denominator of that number is an integral power of 2. A number is, in general,
represented approximately to a fixed number of significant digits (the
significand) and scaled using an exponent; numbers are usually binary, octal,
decimal or hexadecimal. A number that can be represented exactly is of the
following form:
$$\text{significand} \times \text{base}^\text{exponent}$$
For example, $1.2345 = \underbrace{12345}_\text{significand} \times
\,\underbrace{10}_\text{base}\!\!\!\!\!\!^{\overbrace{-4}^\text{exponent}}$
The term floating point refers to the fact that a number's radix point (decimal
point, or, more commonly in computers, binary point) can ``float''; that is, it
can be placed anywhere relative to the significant digits of the number.
\subsection{Representation of Floating-Point Numbers}
Given below are pictorial representations of 32-bit and 64-bit floating point
numbers:
\begin{figure}[H]
\begin{center}
\begin{tikzpicture}[node distance=1.8cm]
\foreach \x in {0, ..., 31}
\draw (\x*0.4cm, 0) -- +(.4cm, 0) -- +(.4cm, 0.5cm) -- +(0, .5cm) --
cycle;
\draw (0.2cm, 0.6cm) -- (0.2cm, 1cm);
\draw (0.6cm, 0.6cm) -- (0.6cm, 1cm) -- (3.4cm, 1cm) -- (3.4cm, 0.6cm);
\draw (3.8cm, 0.6cm) -- (3.8cm, 1cm) -- (12.6cm, 1cm) -- (12.6cm, 0.6cm);
\foreach \x in {31, ..., 0}
\node at (\x*0.4cm, 0) [xshift=.2cm, yshift=-.3cm, align=center] {\tiny \x};
\node at (0.2cm, 1.3cm) [align=center] {sign};
\node at (2cm, 1.3cm) [align=center] {exponent(8 bits)};
\node at (8.2cm, 1.3cm) [align=center] {fraction(23 bits)};
\end{tikzpicture}
\caption{32-bit floating-point numbers}
\label{fig:32-bit floating point numbers}
\end{center}
\end{figure}
Similarly in 64-bit floating point numbers we have 1 bit for sign, 11 bits for
exponent and 52 bits for fractional part. Clearly zero will be represented by
all sign and exponent bits having value 0 for them.
C also has concept of positive and negative infinities. Sign bit is 0 for
positive infinity and 1 for negative infinity. Fractional bits are 1 while
exponent bits are all 1.
Certain operations cause floating point exceptions like division from zero or
square rooting a negative number. Such exceptions are represented by NANs which
stands for ``not a number''. Sign for NaNs is similar i.e. 0 for positive and 1
for negative. Exponent bits are 1 and fractional part is anything but all 0s
because that represents positive infinity.
There is also four rounding modes which we will see later.
Now let us see a program to see how we can take input and print the floating
point numbers.
\begin{minted}[frame=single]{c}
#include <stdio.h>
int main()
{
float f = 0.0;
double d = 0.0;
long double ld = 0.0;
printf("Enter a float, double and long double separated by space:\n");
scanf("%f %lf %Lf", &f, &d, &ld);
printf("You entered %f %lf %Lf\n", f, d, ld);
return 0;
}
\end{minted}
If you run this you might have following output:
\\\\\texttt{Enter a float, double and long double separated by space:\\
\textbf{3.4 5.6 7.8}\\
You entered 3.400000 5.600000 7.800000\\\\}
By default these print upto six significant digits but doubles have double
precision as we have studied. Now that we know basic types let us learn a bit
about input/output.
\section{Boolean Data Type}
A boolean type has only two values possible; \texttt{true} and
\texttt{false}. Fundamentally a boolean is an integer. 0 is considered as
\texttt{false} while all other are treated as \texttt{true} including negative
integers. \texttt{true} and \texttt{false} are macros (which we will see later)
which are defined in \texttt{<stdbool.h>} and they expand to 1 and 0
respectively. The type \texttt{bool} is a macro which expands to
\texttt{\_Bool}. Let us see a small program with booleans:
\begin{minted}[frame=single]{c}
#include <stdio.h>
#include <stdbool.h>
int main()
{
bool bcpp = 4;
_Bool bc = 5;
bool True = true;
_Bool False = false;
bool bFalseCPP = -4;
_Bool bFalseC = -7;
printf("%d %d %d %d %d %d\n", bcpp, bc, True, False, bFalseCPP, bFalseC);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{1 1 1 0 1 1\\\\}
Note that \texttt{true} and \texttt{false} are keywords while \texttt{True} and
\texttt{False} are identifiers.
\section{Complex Type}
C99 introduced another type \texttt{complex}. As you may know a complex has two
parts real and imaginary. These parts of a complex individually are
floating-point numbers i.e. they can be represented by \texttt{float, double}
or \texttt{long double}. The header \texttt{<complex.h>} deals with complex
numbers and there are lots of useful functions to use and manipulate complex
numbers which are detailed in the reference. Let us see a simple example:
\begin{minted}[frame=single]{c}
#include <stdio.h>
#include <complex.h>
int main()
{
double complex z = 4.0 + 3.0i;
printf("Absolute value of z is %lf\n", cabs(z));
double complex zConj = conj(z);
printf("Imaghinary part of conjugate is now %lf\n", cimag(zConj));
return 0;
}
\end{minted}
Since we have used \texttt{<complex.h>} we need to link math library to compile
this program so the compilation command would look like \texttt{\$gcc -o complex
complex.c -lm}. The \texttt{-lm} part is mandatory for compilation of this
program. Let us see the output:
\\\\\texttt{Absolute value of z is 5.000000\\
Imaginary part of conjugate is now -3.000000\\\\}
I have shown only two functions \texttt{cabs} and \texttt{cimag} but there are
a lot more and very useful functions available to do computations on complex
numbers. The way to declare a complex number is shown. It is a combination of
real part and imaginary part where imaginary part is coupled with $i$ which is
given by $i = \sqrt{-1}$. Here, \texttt{cabs} computes absolute value of
complex number which is given by $\sqrt{x^2 + y^2}$ where complex number is
given by $x + iy$. We will see rest of the functions in reference.
\section{Void and Enum}
The \texttt{void} type comprises an empty set of values; it is an incomplete
object type\footnote{An incomplete type is a type whose size is unknown.} that
cannot be completed. You cannot declare variables with type void. You cannot
declare an array of\texttt{void} type. Any declaration which requires size of
type to be known cannot have \texttt{void} as its type. However, we can declare
pointers of type \texttt{void} because pointers do not require size of type to
be known. For this reason \texttt{void} pointers are used as generic pointers
and is used to convert one type of pointers into another. It is a low-level
type and should be used sparingly. We will see examples of \texttt{void} type
later in the book.
Enum type is an integral type and is used as symbolic constants. An enumeration
is is a set of integers(values). You can do all operations of an enum on an
enumeration member which you can do on an integer. It starts from zero by
default and increments by one unless specifically specified. Given below is an
example of \texttt{enum} type.
\begin{minted}[frame=single]{c}
#include <stdio.h>
int main()
{
typedef enum {zero, one, two} enum1;
typedef enum {alpha=-5, beta, gamma, theta=4, delta, omega} enum2;
printf("zero = %d, one = %d, two=%d\n", zero, one, two);
printf("alpha = %d, beta = %d, gamma=%d, theta=%d, delta=%d, omega=%d\n", \
alpha, beta, gamma, theta, delta, omega);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{zero = 0, one = 1, two=2\\
alpha = -5, beta = -4, gamma=-3, tehta=4, delta=5, omega=6\\\\}
\section{Literals}
There are four categories of constants: character, integer, floating-point, and
enumeration constant. There are certain rules about constants. Commas and
spaces are not allowed except for character and string constants. Their range
cannot outgrow the range of there data type. For numeric type of constants they
can have a leading (-)minus sign.
Given below is an example of integer constants:
\begin{minted}[frame=single]{c}
#include <stdio.h>
int main()
{
int decimal = 7;
int octal = 06;
int hex = 0xb;
printf("%d %o %x\n", decimal, octal, hex);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{7 6 b\\\\}
As you can see there are three different categories for integer constants:
decimal constants (base 10), octal constants (base 8) and hexadecimal constants
(base 16). Also, you must have noticed how a zero is prefixed before octal type
and a zero and x for hexadecimal type. The \texttt{\%d} format specifier is
already known to you for signed decimals. However, now you know two more
\texttt{\%o} and \texttt{\%x} for unsigned octal and unsigned hexadecimal
respectively. For unsigned integer it is \texttt{\%u}. There is one more format
specifier which you may encounter for signed decimal and that is
\texttt{\%i}. Note that there is nothing for binary constants.
A floating-point constant is a base-10 number that contains either decimal
point or exponent or both. Given below is an example of floating-point
constants:
\begin{minted}[frame=single]{c}
#include <stdio.h>
int main()
{
float f = 7.5384589234;
double d = 13.89457883453857823;
long double ld = 759.8263478234729;
printf("%f %lf %Lf\n", f, d, ld);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{7.538459 13.894579 759.826348\\\\}
For example 123456 can be written as one of 1.23456e5, 1.23456e+5, 1.23456E5,
.123456e6, 12.3456E+4 etc. The expoenent is integer and it cannot be
floating-point number.
A character constant is a single character enclosed in apostrophes. Some
examples of a character constants are \texttt{`A', `T', `)', `?'. ` '}. Commas
and blanks are allowed as character constants in apostrophes. The character
constants are fundamentally integers and all arithmetic operations can be
performed over them. These values depend on the character set of the computer
in use. However, in this book we are concerned about ASCII character set.
\begin{minted}[frame=single]{c}
// Character constants
// Description: Demo of character constants
#include <stdio.h>
int main()
{
char c = 'S';
char* str ="Shiv S. Dayal";
printf("%c %s\n", c, str);
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{S Shiv S. Dayal\\\\}
Also, the string is a character pointer that is it can point to memory location
where a character is stored. In this case the string is stored in an area of
memory called stack. When memory is allocated the compiler knows how much has
been allocated. For string there is something called null character represented
by \texttt{\textbackslash{} 0} which is used to terminate string. By using this mechanism
the program knows where the string is terminating. It is treated in next
section as well.A very interesting thing to be noted is char is considered to
be an integral type. It is allowed to perform addition etc on char type. Till
now you have learnt many format specifiers and have seen they all start with
\texttt{\%}. Think how will you print \texttt{\%} on stdout. It is printed like
\texttt{\%\%}. C program have got something called ASCII table which is a
7-bit character table values ranging from0 to 127. There is also something
called escape sequences and it is worth to have a look at them.
\section{Escape Sequences}
There are certain characters which are not on keyboard and are not displayed in
the form on printing characters. Some of these are expressed using
\textit{escape sequences}. An escape sequence always begins with a backslash
and is followed by on or more characters. Given below is the table of escape
sequences:
\begin{table}[H]
\begin{center}
\caption{Escape Sequences}
\begin{longtable}{lcc}
\textbf{Character}&\textbf{Escape Sequence}&\textbf{ASCII Value}\\
bell (alert)&$\backslash$a&007\\
backspace&$\backslash$b&008\\
horizontal t&$\backslash$t&009\\
vertical tab&$\backslash$v&011\\
newline (line feed)&$\backslash$n&010\\
form feed&$\backslash$f&012\\
carriage return&$\backslash$r&013\\
quotation mark(")&$\backslash$"&034\\
apostrophe(')&$\backslash$'&039\\
question mark(?)&$\backslash$?&063\\
backslask ($\backslash$)&$\backslash\backslash$&092\\
null&$\backslash$0&000
\end{longtable}
\end{center}
\end{table}
Now we will talk about all these one by one. \texttt{\textbackslash{} 0} which
is also known as \texttt{NULL} is the string terminating character, as said
previously, and must be present in string for it to terminate. For example, in
our character constant program the str string is \texttt{"Shiv S. Dayal"}. So
how many characters are there 13? Wrong 14! The NULL character is hidden. Even
if we say \texttt{str="";} then it will contain one character and that is this
\texttt{NULL}. Most string related C functions rely on this presence of
\texttt{NULL} and causes a lot of mess because of this if missing. The bell
escape sequence if for a bell from CPU. Let us write a program and see it in
effect.
\begin{minted}[frame=single]{c}
// Bell Program
// Description: Demo of bell escape sequence
#include <stdio.h>
int main()
{
printf("hello\a");
getchar();
return 0;
}
\end{minted}
The output of this program will be \texttt{hello} on \texttt{stdout} and an
audible or visible bell as per settings of your shell. Notice the
\texttt{getchar()} function which waits for input and reads a character from
\texttt{stdin}. Next is backspace escape sequence. Let us see a program for its
demo as well:
\begin{minted}[frame=single]{c}
// Backspace Program
// Description: Demo of backspace escape sequence
#include <stdio.h>
int main()
{
printf("h\b*e\b*l\b*l\b*o\b*\n");
printf("\b");
getchar();
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{*****}\\\\
It is \texttt{hello} replaced by \texttt{*}s. A minor modification in this
program to replace the character as soon as key is pressed by some other
character will turn it into a password program. Backspace escape sequence means
when it is encountered the cursor moves to the previous position on the line in
context. If active position of cursor is initial position then C99 standard
does not specify the behavior of display device. However, the behavior on my
system is that cursor remains at initial position. Check out on yours. The
second \texttt{printf} function determines this behavior.
Next we are going to deal with newline and horizontal tab escape sequences
together as combined together they are used to format output in a beautiful
fashion. The program is listed below:
\begin{minted}[frame=single]{c}
// Newline and Horizontal tab program Program
// Description: Demo of newline and horizontal tab escape sequence
#include <stdio.h>
int main()
{
printf("Before tab\tAftertab\n");
printf("\nAfter newline\n");
getchar();
return 0;
}
\end{minted}
and the output is:
\\\\\texttt{Before tab~~~~Aftertab}\\\\
\texttt{After newline}\\\\
Here I leave you to experiment with other escape sequences. Feel free to
explore them. Try various combinations; let your creative juices flow.