Pages

Tuesday, November 16, 2010

Defensive Programming

Overview

Defensive Programming is a technique where you assume the worst from all input

  • The biggest danger to your application is user input.
    • It's uncontrolled, unexpected and unpredictable.
  • The input sent to your application could be malicious.
    • Or it could just be something you never expected.
  • Debugging takes a lot of time.

 

First Rule

Never Assume Anything!

Two kinds of assumptions: 

  • Expected input
    • Data from the user cannot be trusted. As such, all input must be validated.
    • For each input:
      • Define the set of all legal input values.
      • When receiving input, validate against this set.
      • Determine the behavior when input is incorrect:
        • Terminate
        • Retry
        • Warning
  • The programmer assuming something about a programming language
    • Some primitive data types have different sizes depending on the operating system and the hardware platform. For example, integers have been 8,16,32 and 64 bits. Assuming the size of a data type can be disastrous when working on different platform.
    • In C, the size of data types are defined in limits.h. In addition, C has the sizeof operator which will calculate the size of a variable.
    • You need to be especially careful on integer operation.
short x = 10000 * 10
Will x overflow?


Testing:

  • Just testing that it works is not good enough.
  • Error cases need to be tested, to see that the application reacts accordingly. Add tests for different kinds of input:
  1. illogical
  2. strange ASCII characters
  3. numerical/non-numerical
  4. positive/negative/0
  5. too large/small
  6. only composed of numbers/letters
  7. border values
  • Ask other people to test your application
    • First start with the technical testers
    • The asks non-technical people
Second Rule

Use standards!

  • Proper coding standards address weaknesses in the language standard and/or compiler design.
  • They define a format or "style" used for writing code. Every software development team should has an agreed-upon and formally documented coding standard. Coding standards make code more coherent and easier to read, thus reduce the likelihood of bugs. They cover a wide range of topics:
  1. Variable naming
  2. Indentation
  3. Position of brackets
  4. Content of header files
  5. Function declaration
  6. Use of constants/magic numbers
  7. Macro definitions
Third Rule

Keep code as simple as possible !

  • Complexity breeds bugs.
  • Software should only contain the features it needs.
  • Proper planning is key to keeping you application simple
  • Functions should be seen as a contract
    • Given input, the execute a specific task.
    • They should not do anything other than that specific task.
    • If they cannot execute that task, they should have some kind of indicator so that the callee can detect the error.
      • Throw an exception (doesn't work in C)
      • Set a global error value
      • Return an invalid value
        • NULL?
        • False?
        • Negative number?
  • Refactoring
    • Is not a bug-fixing technique. Is a good technique to battle feature creep:
      • Features are often added during development
      • These features are more often the source of problems
    • Fights this by forcing the programmer to reevaluate the structure of his/her program.
    • It can help you keep you application simple
  • Third-party libraries
    • Code reuse is not just a smart-choice, it's a safe choice. Odds are that a specific library has proven itself and is much more stable than anything you could build short-term. Although code reuse is highly recommended, many questions must be addressed before using someone else's code:
      • Do this do exactly what I need?
      • How much will I need to change my design?
      • How stable is it? What reputation does it have?
      • How old is the code?
      • Who built it?
      • Are people still using it? Can I get help?
      • How much documentation is there?
Case Study

(on function PF_SafeStrCpy)

1. Initial version:

/* 
** This function copies the null terminated string str2 to str1. 
** Str1 is truncated, if the the length of str2 is greater than or equal 
** the length of str1. The parameter len1 must conatin the length value 
** of str1 (sizeof).
*/
/* this function replaces the system one due to QAC */
IM_string PF_SafeStrCpy (IM_string str1, 
IM_cstring str2, 
IM_word len1)
{
IM_word len2;

/* strlen returns the length without the terminating '\0' -> + 1 */
len2 = strlen(str2) + 1; 

if (len2 > len1)
{
str1[len1 - 1] = '\0';
return strncpy(str1, str2, len1 - 1);
}
else
{
str1[len2] = '\0';
return strncpy(str1, str2, len2);
}
}

2. Problem discovered: A simple test proves that this function is not so safe though:

char str1 [100]; 
char str2 [100] / * strlen (str2) == 99 * / 

The call PF_SafeStrCpy (str1, str2, 100); goes through the else branch and writes past the end of str1.

3. Actual corrected version:

/**
* \see PF_Basic.h 
* corrected version of PF_SafeStrCpy; the origin is from project SAR 
*/
char* PF_SafeStrCpy (char* str1, const char* str2, IM_word len)
{
if ( 0 == len ) {
return NULL;
}
else {
char* cptr = strncpy(str1, str2, len);
str1[len-1] = '\0';
return cptr;
}
}

4. Remarks on actual version:

Attack possibilities:

  • Passing NULL as "src" (str2) or "dest" (str1) can easily cause the program to terminate, thereby enabling a DoS attack
  • Improperly passing in the "count" (len) parameter (that strncpy will use), causes buffer overflow problems

5. General solution:

As a rule, the return string buffer must be at least large enough to hold the specified maximum number of characters, not bytes, plus the NULL character.

Follow these rules for safe use of strncpy():

  1. Verify that src and dest are not NULL.
  2. Null terminate the final character of DEST.
  3. Use strncpy(dest, src, sizeof(dest)/sizeof(dest[0])).
  4. If the final character (i.e., sizeof(dest) - 1) of DEST is no longer null, then the buffer was overrun.

Steps are effective, but usage still requires care in checking sizes.

6. Observations:

  • strlen() vs sizeof()

This code:

char String[]="Hello";

printf("\n Size of string: %d. String length: %d.\n", sizeof(String), strlen(String) );

... prints: Size of string: 6. String length: 5.

  • Non NULL-terminated string:

This code:

char String[5] = { 'H', 'e', 'l', 'l', 'o' };

printf("\n Size of string: %d. String length: %d.\n", sizeof(String), strlen(String) );

... prints: Size of string: 5. String length: 19. (String length is unknown. The function strlen() searches until it finds a NUL terminator.)

  • strlen() with pointers:

This code:

char *ptr = "Hello";

printf("For ptr: sizeof = %u, strlen = %u.\n", sizeof ptr, strlen(ptr));

... prints: For ptr: sizeof = 4, strlen = 5\n. (ptr is a pointer, so zieof(ptr) is the size of the pointer, in this case 4 bytes)

  • strlen() is NOT safe to call!
    • Unless you positively know that the string IS null-terminated.
    • When you call strlen() on an improperly terminated string:
      • Strlen scans until a null character is found
      • Can scan outside buffer if string is not null-terminated
      • Can result in a segmentation fault or bus error
Good external links
  1. Defensive programming on Wikipedia
  2. Secure programmer: Developing secure programs article by David Wheeler
  3. Secure programmer: Countering buffer overflows article by David Wheeler
  4. Proactive Debugging article by Jack Ganssle
  5. Secure Programming for Linux and Unix HOWTO -- Creating Secure Software free book by David Wheeler
  6. Programming issues: Buffer overflows

No comments:

Post a Comment