The following article is an experiment with C and some of
the concepts of COR (computer organization).Well, not everything that is
taught, comes out to be exactly same in implementation.
Following experiment is done on
OS | Fedora x86_64 |
---|---|
Processor | Intel(R) Core(TM)i5-2540M CPU |
System type | 64 bit |
gcc | 4.7.0 |
Try the following code [ 1 ]
struct Coord {
long x,y;
};
void
canItModify(){
long *x;
x=(long*)((char*)&x+40);
(*x)=0;
x++;
(*x)=0;
}
int main(int
argc,char *argv[]) {
struct Coord pt;
scanf("%d%d",&pt.x,&pt.y);
canItModify();
printf("%d
%d\n",pt.x,pt.y);
return 0;
}
Compile and run it.
Well, you will be surprised by the fact that no matter
whatever you give as an input printf will always print 0 0.
Now try this [ 2 ]
struct Coord
{
long x,y;
};
void
canItModify(){
long *x;
x=(long*)((char*)&x+72);
(*x)=0;
x++;
(*x)=0;
}
void womanIntheMiddle()
{
int ar[3];
scanf("%d%d%d",ar,ar+1,ar+2);
canItModify();
}
int main(int
argc,char *argv[]) {
struct Coord pt;
scanf("%d%d",&pt.x,&pt.y);
womanIntheMiddle()
printf("%d
%d\n",pt.x,pt.y);
return 0;
}
compile and run.
Now again you will see the same result. Even a
"womanIntheMiddle" can't stop "canItModify" to meet
variable pt in main.
Now it is obvious that core of the above two program is
"canItModify()".And it is instructions in this function that is
altering the pt variable of the main function.Well if a variable, locally scoped to some function, can be
modified by other function anywhere in the code that doesn't have direct access
to that variable then what about the scope of the variable.In words of Dennis M. Ritchie, "If a variable is locally scoped to a function, no other function can
have direct access to it".It is worth noticing that it is said that the function can’t
have direct access to a variable locally scoped to another function i.e; a
function might have an indirect access to local variable of another function.Further in this article I am going to explain how an external
function can access variables of other functions without taking them as
parameter.
Understanding of C and pointer arithmetic is a must for further
reading. If you are weak in any one of these first read those.
WARNING :- Method
explained in this article is not a standard programming practice so do not get
into habit of using it. This is just for fun and understanding.
Lets Swim through the architecture and investigate.
How Local variables
are allocated space ?
There are 4 memory sections for a program when loaded in memory
to execute.
- Code Segment :- Non-modifiable section that contains code.
- Data Segment :- Contains global and static variable
- Heap :- Dynamic memory is allocated from here.
- Stack :- function call and spaces for local variable.
We won’t be discussing CS, DS and heap but stack memory.
Local variables of all functions are there on the stack and each of them use
variable locally scoped to those functions.So how can a function be tricked to make it think that it's
variable is in another function and access it? Answer to this question is that
a function can't be tricked to do this, as relative addresses to local variables
are hard-coded in the machine instruction after compilation, which is in the
code segment. But there is a way-out.
Here is what happen when a function call is made (Numb3rs
are corresponding to 64 bit machine, It will vary for 32 bit and so on)
- Return address of Next instruction is pushed onto the stack (8 byte)
- stack pointer decremented (8 byte).
- Base pointer of previous function is pushed on the stack (8 byte).
- Memory for local variable is allocated (16*n , n is a +ve integer, it depends on compiler implementation, as per above specification it is 16*n).
- stack pointer is decremented (16*n).
Now the question is,"How much exact memory is allocated and where all variables reside?".
Answer to this question is that memory sufficient enough for
all the local variables such that memory look-up is efficient. These
variables reside in the memory space, allocating spaces for the variable
declared first i.e; if an int is declared first, it will be given space first
then other variables, in the order of their declaration from the bottom of actual space
allocated for local variable, of the current function.
Note :- If a function is taking some arguments, first 6
arguments are assigned to registers rdi,rsi,rdx,rcx,r8,r9 respectively, all
other arguments are stored in stack after all local variables are assigned
their space on local stack space. Read from here.
Here is stack memory structure of 2nd example as given above
(Each row in Stack segment is 8 byte
long and ___ is blank space, sfp (8 byte)=> base pointer of previous
function, ret(8 byte) => return address)
fig :- stack growing from higher to lower and heap from lower to higher
How variables are
placed in local frame ?
A 64 bit machine can read 8 byte in one memory cycle and
this is from a location that is multiple of 8. Compiler tries to place the
variables in such a way so that it takes minimum memory cycle to read those
variables from memory.
eg :- for a function "dummyfunction",
void dummyfunction() {
int
i; //int takes 4 byte
long
j; //long takes 8 byte
int
k; //int takes 4 byte
.....
.....
}
Ideally it should take 16B and it is also multiple of 16
but it will take 32B.
Explanation
Suppose base address from where variables are placed is 0.
- First space for an int will be allocated.As int takes 4 byte it will be from 4 to 0.
- Now space for j (long) will be allocated.As it takes 8 byte, it can be assigned space from 12 to 4 but this assignment has a problem that it will take 2 memory cycle to read one long variable (16-8 and 8-0) as memory read is not from in between. If it is assigned space from 16-8, it will take just one memory cycle to read this variable. Therefore j is assigned space from 16-8.
- Next variable k can be assigned @ 8-4 but compiler is not that intelligent for making such arrangement you will have to arrange the variable in order. Compiler will place k from 20-16.
As requirement is 20 it will assign 32 byte space for local
variable of given function.
Note :- Space allocated for local variable of each function
has a gap of 16 byte (sizeof(ret)+sizeof(sfp)) from the local variable space of
previous function.
Calculation for example 1 :
- main will need 32 byte for local variables in stack. 16 byte for variable pt (2 long = 8*2 = 16) after that 4 byte for int (argc) and 8 byte for pointer (argv = address takes 8 byte in 64bit machine).
- When "canItmodify" function will be called there will be a 8 byte return address and a 8 byte stack frame pointer on stack.
- "canItModify" will allocate 16 byte for local variable and x will require 8 byte (pointer need 8 byte).
- Bottom 8 byte will be assigned to x.
Therefore the distance of &x (In "canInModify") from pt (In "main") will be
8 (size of x)+16 (sizeof(ret)+sizeof(sfp)+16 (space for argv+space for argc)=40.
Calculation for example 2 :
- main will need 32 byte for local variables in stack. 16 byte for variable pt (2 long = 8*2 = 16) after that 4 byte for int (argc) and 8 byte for pointer (argv = address takes 8 byte in 64bit machine).
- When "womanInthMiddle" will be called there will be a 8 byte return address and a 8 byte stack frame pointer.
- local variable in "womanIntheMiddle" will need 12 byte (3*sizeof(int)=3*4=12 byte), therefore 16 byte will be allocated for local variable in it.
- When "canItModify" will be called there will be a 8 byte return address and a 8 byte stack frame pointer.
- "canItModify" will allocate 16 byte for local variable and x will require 8 byte (pointer need 8 byte).
- Bottom 8 byte will be assigned to x.
Therefore the distance of &x (in "canItModify") from pt (In "main") will be
8 (size of x)+16 (sizeof(ret)+sizeof(sfp) between "canItModify" and "womanIntheMiddle")+16 (local variable in "womanIntheMiddle")+16 (sizeof(ret)+sizeof(sfp) between "womanIntheMiddle" and "main")+16 (space for argv+space for argc)=72
The conclusion
Now as we have seen how space for local variables are
allocated and spaces between function calls, it is easy to calculate the exact
distance between any two variables in a function call sequence. As all these
variables are in a linear address space and function scope is just an illusion, it is possible to access any variable anywhere in any function.
The above article has been constructed by my own experiment
with “C” and reading various article and books through internet. Any suggestion
and idea is always welcome. Wish you good luck with this new toy.
nice article .. :)
ReplyDelete