Sunday, August 19, 2012

Scope of variables in C. An illusion ?


The following article is an experiment with C and some of the concepts of COR (computer organization).Well, not everything that is taught, comes out to be exactly same in implementation.
Following experiment is done on

OS Fedora x86_64
Processor Intel(R) Core(TM)i5-2540M CPU
System type 64 bit
gcc 4.7.0

Try the following code  [ 1 ]

struct Coord {
                        long x,y;
};
void canItModify(){
                        long *x;
                        x=(long*)((char*)&x+40);
                        (*x)=0;
                        x++;
                        (*x)=0;
}
int main(int argc,char *argv[]) {
                        struct Coord pt;
                        scanf("%d%d",&pt.x,&pt.y);
                        canItModify();
                        printf("%d %d\n",pt.x,pt.y);
                        return 0;
}

Compile and run it.
Well, you will be surprised by the fact that no matter whatever you give as an input printf will always print 0 0.

Now try this  [ 2 ]

struct Coord {
                        long x,y;
};
void canItModify(){
                        long *x;
                        x=(long*)((char*)&x+72);
                        (*x)=0;
                        x++;
                        (*x)=0;
}
void womanIntheMiddle() {
                        int ar[3];
                        scanf("%d%d%d",ar,ar+1,ar+2);
                        canItModify();
}
int main(int argc,char *argv[]) {
                        struct Coord pt;
                        scanf("%d%d",&pt.x,&pt.y);
                        womanIntheMiddle()
                        printf("%d %d\n",pt.x,pt.y);
                        return 0;
}

compile and run.
Now again you will see the same result. Even a "womanIntheMiddle" can't stop "canItModify" to meet variable pt in main.

Now it is obvious that core of the above two program is "canItModify()".And it is instructions in this function that is altering the pt variable of the main function.Well if a variable, locally scoped to some function, can be modified by other function anywhere in the code that doesn't have direct access to that variable then what about the scope of the variable.In words of Dennis M. Ritchie, "If a variable is locally scoped to a function, no other function can have direct access to it".It is worth noticing that it is said that the function can’t have direct access to a variable locally scoped to another function i.e; a function might have an indirect access to local variable of another function.Further in this article I am going to explain how an external function can access variables of other functions without taking them as parameter.

Understanding of C and pointer arithmetic is a must for further reading. If you are weak in any one of these first read those.


WARNING :- Method explained in this article is not a standard programming practice so do not get into habit of using it. This is just for fun and understanding.

Lets Swim through the architecture and investigate.

How Local variables are allocated space ?

There are 4 memory sections for a program when loaded in memory to execute.
  1. Code Segment    :- Non-modifiable section that contains code.
  2. Data Segment     :- Contains global and static variable
  3. Heap                  :- Dynamic memory is allocated from here.
  4. Stack                 :- function call and spaces for local variable.
We won’t be discussing CS, DS and heap but stack memory. Local variables of all functions are there on the stack and each of them use variable locally scoped to those functions.So how can a function be tricked to make it think that it's variable is in another function and access it? Answer to this question is that a function can't be tricked to do this, as relative addresses to local variables are hard-coded in the machine instruction after compilation, which is in the code segment. But there is a way-out.
Here is what happen when a function call is made (Numb3rs are corresponding to 64 bit machine, It will vary for 32 bit and so on)
  1. Return address of Next instruction is pushed onto the stack (8 byte)
  2. stack pointer decremented (8 byte).
  3. Base pointer of previous function is pushed on the stack (8 byte).
  4. Memory for local variable is allocated (16*n , n is a +ve integer, it depends on compiler implementation, as per above specification it is 16*n).
  5. stack pointer is decremented (16*n).
Now the question is,"How much exact memory is allocated and where all variables reside?".
Answer to this question is that memory sufficient enough for all the local variables such that memory look-up is efficient. These variables reside in the memory space, allocating spaces for the variable declared first i.e; if an int is declared first, it will be given space first then other variables, in the order of their declaration from the bottom of actual space allocated for local variable, of the current function.

Note :- If a function is taking some arguments, first 6 arguments are assigned to registers rdi,rsi,rdx,rcx,r8,r9 respectively, all other arguments are stored in stack after all local variables are assigned their space on local stack space. Read from here.

Here is stack memory structure of 2nd example as given above (Each row  in Stack segment is 8 byte long and ___ is blank space, sfp (8 byte)=> base pointer of previous function, ret(8 byte) => return address)

fig :- stack growing from higher to lower and heap from lower to higher

How variables are placed in local frame ?
A 64 bit machine can read 8 byte in one memory cycle and this is from a location that is multiple of 8. Compiler tries to place the variables in such a way so that it takes minimum memory cycle to read those variables from memory.

eg :- for a function "dummyfunction",
void dummyfunction() {
                int i;     //int takes 4 byte
                long j;    //long takes 8 byte
                int k;     //int takes 4 byte
                .....
                .....
}
Ideally it should take 16B and it is also multiple of 16 but it will take 32B.

Explanation

Suppose base address from where variables are placed is 0.
  1. First space for an int will be allocated.As int takes 4 byte it will be from 4 to 0.
  2. Now space for j (long) will be allocated.As it takes 8 byte, it can be assigned space from 12 to 4 but this assignment has a problem that it will take 2 memory cycle to read one long variable (16-8 and 8-0) as memory read is not from in between. If it is assigned space from 16-8, it will take just one memory cycle to read this variable. Therefore j is assigned space from 16-8.
  3. Next variable k can be assigned @ 8-4 but compiler is not that intelligent for making such arrangement you will have to arrange the variable in order. Compiler will place k from 20-16.
As requirement is 20 it will assign 32 byte space for local variable of given function.

Note :- Space allocated for local variable of each function has a gap of 16 byte (sizeof(ret)+sizeof(sfp)) from the local variable space of previous function.

Calculation for example 1 :
  • main will need 32 byte for local variables in stack. 16 byte for variable pt (2 long = 8*2 = 16) after that 4 byte for int (argc) and 8 byte for pointer (argv = address takes 8 byte in 64bit machine).
  • When "canItmodify" function will be called there will be a 8 byte return address and a 8 byte stack frame pointer on stack.
  • "canItModify" will allocate 16 byte for local variable and x will require 8 byte (pointer need 8 byte).
  • Bottom 8 byte will be assigned to x.
Therefore the distance of &x (In "canInModify") from pt (In "main") will be 
8 (size of x)+16 (sizeof(ret)+sizeof(sfp)+16 (space for argv+space for argc)=40.

Calculation for example 2 :
  • main will need 32 byte for local variables in stack. 16 byte for variable pt (2 long = 8*2 = 16) after that 4 byte for int (argc) and 8 byte for pointer (argv = address takes 8 byte in 64bit machine).
  • When "womanInthMiddle" will be called there will be a 8 byte return address and a 8 byte stack frame pointer.
  • local variable in "womanIntheMiddle" will need 12 byte (3*sizeof(int)=3*4=12 byte), therefore 16 byte will be allocated for local variable in it.
  • When "canItModify" will be called  there will be a 8 byte return address and a 8 byte stack frame pointer.
  • "canItModify" will allocate 16 byte for local variable and x will require 8 byte (pointer need 8 byte).
  • Bottom 8 byte will be assigned to x.
Therefore the distance of &x (in "canItModify") from pt (In "main") will be
8 (size of x)+16 (sizeof(ret)+sizeof(sfp) between "canItModify" and "womanIntheMiddle")+16 (local variable in "womanIntheMiddle")+16 (sizeof(ret)+sizeof(sfp) between "womanIntheMiddle" and "main")+16 (space for argv+space for argc)=72 

The conclusion
Now as we have seen how space for local variables are allocated and spaces between function calls, it is easy to calculate the exact distance between any two variables in a function call sequence. As all these variables are in a linear address space and function scope is just an illusion, it is possible to access any variable anywhere in any function.
The above article has been constructed by my own experiment with “C” and reading various article and books through internet. Any suggestion and idea is always welcome. Wish you good luck with this new toy.