Code Auditing and Reverse Engineering

  Windows IT Pro
Windows IT Library
  - Advertise        
Windows IT Pro Logo

  Home  |   Books  |   Chapters  |   Topics  |   Authors  |   Book Reviews  |   Whitepapers  |   About Us  |   Contact Us  |   ITTV  |   IT Jobs

search for  on    power search   help
 






Code Auditing and Reverse Engineering
View the book table of contents
Author: Michael Cross
Published: January 2007
Copyright: 2007
Publisher: Syngress
 


The goal of this chapter is for any computer-literate individual to be able to take an already-developed piece of code and determine if it has fundamental security problems.We provide you with a detailed list of problem areas pertaining to various popular programming languages, and show you how to use such a list in assessing the source code of a Web application.


 

Designing a program from scratch allows you to incorporate security from the beginning, or at least be familiar enough with the program to rationalize potential vulnerable areas in the code. However, as an administrator or developer, you may face various alternate situations:You may have joined a development project already in progress, thus inheriting someone else’s code. Or you have made the decision to use third-party code (such as an open source library or CGI application). Or, as an administrator, you’re worried about the quality of code your internal developers are putting on your system.  

In all these situations, it really helps to be able to quickly and efficiently review the code for problems.You don’t have to be a programmer extraordinaire to perform a basic code review; and even if you can’t follow some of the specific programming nuances, you can at least raise red flags for later review by a more knowledgeable individual.The goal of this chapter is for any computer-literate individual to be able to take an already-developed piece of code and determine if it has fundamental security problems.We provide you with a detailed list of problem areas pertaining to various popular programming languages, and show you how to use such a list in assessing the source code of a Web application. First, we look at how to efficiently trace through a program, effectively giving you a game plan on where to start.Then, we overview some particularly popular programming languages used for Web application programming, followed by a long list of problem areas and the details associated with each language.


HOW TO EFFICIENTLY TRACE THROUGH A PROGRAM

Let’s face it:There are not enough hours in the day for some things. Spending a few days reviewing piles of source code looking for potential security problems is defi- nitely inefficient, not to mention time consuming (unless you’re being paid to do it). If it’s a small program with a linear logic flow (that is, the program isn’t highly interactive nor does it contain a lot of branching logic), the task may not be that hard; however, if the program is of moderate size, reviewing it can be a headache.This headache is compounded if the source code is distributed among multiple components, contained in multiple files. Starting at the beginning of the program and then stepping through every possible execution path becomes nearly impossible.

This chapter illustrates a different technique for approaching source code reviews. Rather than trace the program forward through execution, we take the reverse approach: proceed directly to the potential problem areas, and then trace back through the program to confirm whether they are vulnerable.Technically, we’re only interested in the execution paths that involve the user; however, trying to follow those paths can be excruciating because data supplied by a user can go every which way after the program starts processing it. So instead, we start at the end and then trace the flow in reverse to see if we encounter a user path.Thus, the emphasis is really in looking for vulnerabilities that involve user-supplied data in some way, shape, or form.

NOTE:
When reviewing code, we don’t need to bother looking at areas where the program internally generates the data, because we assume the program will not try to exploit itself.

The logic behind this approach is simple and best illustrated with an example. Say you had a program that queried the user for a set of particular numeric values. The program then proceeded to perform a large (possibly superfluous) amount of calculations on those values, incorporating values submitted from other users (pulled from a database), calculating and correlating various trends, and finally storing the results in a database record.

Now, the code to perform those calculations may be complex, intense, and exhaustive to try to step through. However, from a security standpoint, it’s easy:We can, for the most part, ignore it. We’re not here to make sure the program works as intended; we’re here to find potential vulnerabilities.Taking that example, we can narrow it down to three potential problem areas:
  • Initial data supplied by the user (and its validity)
  • Reading of additional values from the database during the processing
  • Storing of the final result into the database
The values supplied by the user should be initially checked to see if they are valid data types (in this case, they are all numeric). Looking at the point of data entry (when the data is received from the user) will determine this.

The intermediary values read from the database must be done safely. Looking specifically at the SQL/database queries made lets you see if they (potentially) use any user-supplied data in the actual query; if they don’t, they can be considered "controlled," and thus safe.

Tools & Traps...
Fill Your Toolbox
The grep command-line tool is extremely useful. grep is a UNIX-originated tool used to search files (particularly text files) for particular strings of text. It will output the actual context where the specified string was found, associated line numbers, surrounding lines on text, and so on. You can also tell grep to search multiple files. This makes grep a useful, albeit simplistic, tool to use. Because grep has many different implementations, we recommend using the GNU grep—it’s free and packed full of useful features/options. grep has versions compiled for the Windows platform as well (although the "find" command shipped with Windows provides the same general functionality). It is available for download from www.gnu.org/software/grep/.

Other tools to review source code can readily be found on the Internet. A popular tool is SourceEdit from Brixoft (www.brixoft.net). SourceEdit allows you to review source code for the most common programming languages (C/C++, C#, Visual Basic, Pascal, Java, ASP, PHP, Perl, Cold Fusion, SQL, HTML, CSS, and XML). If you want to review code that isn’t natively supported by SourceEdit, you can either install language files or create new ones using its Language Editor. It also includes a wide range of useful features, including code completion, function list, a hex editor, and other custom tools.

Storing the result should be done in a secure manner.This is a matter of looking at the construction of the SQL/database query used to store the result. As long as the result is properly controlled and filtered, the database update can be considered safe. And thus, we have just given a brief security code review to the application, without having to actually deal with all that complex application calculation logic. Now obviously this method isn’t foolproof; however, the method still stands as an efficient means for individuals who are not programming savvy.

As with any code review, this approach assumes you have all the source available for the application in question.There are times when an application may use external libraries or components—if you don’t have the source to these components, you are limited to two options: meticulously inspecting all data given to and received from the external library/program (reducing the potential for problems within external portion), or blindly trusting it. Which route you choose depends on the circumstances. You can probably trust system libraries, but be suspicious of other thirdwww. party code. When in doubt, go with your instincts. If your instincts are failing you, then be paranoid instead and don’t trust it—you can never be too cautious.

In this approach, we will also be focusing on a programmatic approach—that is, we will focus on the actual (mis)uses of certain functions and the programming language in general.We do not focus on logic-based security flaws, because they require the expertise of knowing exactly what a program is attempting to do, how it is doing such logic, where it is making assumptions, and where it might fail. And of course, all of those items vary from one application to the next, because they are dependant on how the application was coded in the first place.Any programmer could take an infi- nite number of directions to solve a problem—and attempting to make a security checklist of where each method contains problems (logically) is a definite task in futility. If you must tend to such areas, we recommend a review by a professional security reviewer skilled in the programming language of your application.


AUDITING AND REVIEWING SELECTED PROGRAMMING LANGUAGES

Many programming languages are available on the market today. Due to the explosion of Web application development, there even happen to be a few Web-centric ones. Choosing the right language is a black art; each has its pros and cons when it comes to being used for Web applications.This chapter doesn’t care about the actual usefulness and appropriateness of each language; instead, we concern ourselves only with aspects that relate to efficient code auditing.

Java

Java code can come in many flavors: self-contained applications, mobile applets, beans, or even scriptable via Java Server Pages (JSP) and JavaScript. From this point on, when we refer to "Java," we are referring to a bytecode compiled application, applet, or bean; JavaScript and JSP will be considered separate (due to the characteristics of what you would look for).

The "core" Java language basically consists of logic control statements and class/package manipulation routines.The actual functionality is contained in various external packages and classes, which are imported when needed.This aspect provides a useful benefit to you as a reviewer: if the package/class is not imported or otherwise loaded, you don’t have to worry about any potential security problems associated with items in that package/class. For example, you don’t have to check for file-related vulnerabilities if the java.io package(s) are not imported.You can find more information on Java in Chapter 7,"Securing Your Java Code."

Java Server Pages

Java Server Pages ( JSP), as mentioned earlier, are a scriptable version of Java that can be embedded inline within the appropriate HTML document. JSP also has hooks to interface with other server-side Java applets and beans.The JSP language itself is fairly limited, serving more as "glue" between HTML and server-side Java applications. However, in the seemingly Java-crazed world we currently live in (which has nothing to do with the proliferation of Starbucks coffee shops), JSP has become the latest rage.

Active Server Pages

In the Microsoft world, the actual scripting language behind Active Server Pages (ASP) is VBScript. However, there are various third-party ASP emulators like Sun Java System Active Server Pages (formerly Sun ONE andChili!ASP) that technically are not VBScript; therefore, we refer to the language simply as ASP.

ASP is a Visual Basic/VBScript derivative with a structure similar to Java—that is, the basic language implements logic control statements, and all other functionality is contained in external objects.This allows you to selectively look for vulnerability areas based on what objects are being used by the code (like Java). Keep in mind that to ease programmability, the Application, ObjectContext, Request, Response, Server, and Session objects are automatically available in every script (that is, they do not have to be imported).

Server Side Includes

Server Side Includes (SSI) were the ancestor of embedded inline server-side application languages. SSI basically provides the simple functionality to include external files, execute programs, and display variable contents within an HTML file.ASP actually incorporates SSI functionality automatically—this needs to be kept in mind when auditing ASP Web applications.

SSI commands follow the simple format of <!—#command options—>, where command would be the SSI operation (such as include, exec, and so on), and options are various values that determine what the command is supposed to do.

Python

Python is a flexible object-oriented scripting language. Although the core Python interpreter implements basic functionality and logic control, many functions are contained in external modules, which have to be explicitly imported. Again, like Java and ASP, this allows you to more efficiently audit the source code based on which modules are imported.

The Tool Command Language

The Tool Command Language (Tcl) scripting language uses a natural language syntax, which makes coding scripts more intuitive and easy to read. Although Tcl (pronounced tickle) is typically used with its graphical counterpart—the associated toolkit called Tk–Tcl has been used by Web programmers for online Web CGIs. Also similar to various previously mentioned languages,Tcl imports various functionalities from external modules.

Practical Extraction and Reporting Language

Practical Extraction and Reporting Language (Perl) is a scripting language originally implemented on UNIX platforms. In the past, it was a popular language to use for CGI applications; however, the newer embedded scripting languages such as ASP, JSP, ColdFusion, and PHP have definitely encroached on its reign.To make up for this, newer offshoot Perl projects actually embed Perl into Apache (via mod_perl) and IIS (via a Perl plug-in).

Perl implements a lot of functionality within the core language; however, Perl is extensible via external modules. Although you could be selective on what you audit based on imported modules, there is enough risk in the core language’s functionality that makes it imperative that you check for all problem areas.

PHP: Hypertext Preprocessor

PHP (PHP: Hypertext Preprocessor) is a server scripting language popular on the UNIX platform, which has also become popular on Windows systems. PHP commands are embedded inline similar to ASP and JSP. PHP doesn’t use dynamicloading modules; instead, all modules are included at the time the PHP engine is compiled.This means that all functions are available at the application’s runtime, forcing you to look for the entire breadth of vulnerable functions (you can’t take shortcuts based on imported packages and modules, as in Java and ASP).

C/C++

C is the classic "workhorse" language, with its more modern object-oriented C++ derivative. The most recent variation of this language is C#, which Microsoft released as the third generation of the C language. C and C++ are very powerful languages, allowing low-level system access in many places. However, this power comes at a price—C and C++ can be quite complex and ruthless.You have to meticulously make sure everything is allocated, of the right size, and deallocated when finished; no automatic variable expansion or garbage collection exists to make your life easier.

NOTE:
Technically, various C++ classes do handle automatic variable expansion (making the variable larger when there’s too much data to put it in) and garbage collection. However, such classes are not standard and widely vary in features. C does not use such classes.

C/C++ can prove mighty challenging for you to thoroughly audit, due to the extensive control an application has and the amount of things that could potentially go wrong. Our best advice is to take a deep breath and plow forth, tackling as much as you can in the process.

ColdFusion

ColdFusion is an inline HTML embedded scripting language by Allaire. Similar to JSP, ColdFusion scripting looks much like HTML tags—therefore, you need to be careful you don’t overlook anything nestled away inside what appears to be benign HTML markup. ColdFusion is a highly database-centric language—its core functionality is mostly comprised of database access, formatted record output, and light string manipulation and calculation. However, ColdFusion is extensible via various means (Java beans, external programs, objects, and so on), so you must always keep tabs on what external functionality ColdFusion scripts may be using.You can find more information on ColdFusion in Chapter 10,"Securing ColdFusion."


LOOKING FOR VULNERABILITIES

What follows is a collection of problem areas and the specific ways you can look for them.The majority of the problem areas all are based on a single principle: use of a function that interacts with user-supplied data. Realistically, you will want to look at every such function—but doing so may require too much time.Therefore, we have compiled a list of the "higher risk" functions with which remote attackers have been known to take advantage of Web applications.

Because the attacker will masquerade as a user, we only need to look at areas in the code that are influenced by the user. However, you also have to consider other untrusted sources of input into your program that influence program execution: external databases, third-party input, stored session data, and so on.You must consider that another poorly coded application may insert tainted SQL data into a database, which your application would be unfortunate enough to read and potentially be vulnerable to.

Getting the Data from the User

Before we start tracing problems in reverse, the first (and most important, in our opinion) step is to zoom directly to the section of code that accepts the user’s data. Hopefully, all data collection from the user is centralized in one spot; instead, however, bits and pieces may be received from the user as the application progresses (typical of interactive applications). Centralizing all user data input into one section (or a single routine) serves two important functions: it allows you to see exactly what pieces of data are accepted from a user and what variables the program puts them in, and allows you to centrally filter incoming user data for illegal values.

For any language, first check to see if any of the incoming user data is put through any type of filtering or sanity checks. Hopefully, all data input is done at a central location, with the filtering/checking done immediately thereafter.The more fragmented an application’s approach to filtering becomes, the more chances a variable containing user data will be left out of the filtering mechanism(s). Also, knowing ahead of time which variables contain user-supplied data simplifies following the flow of user data through a program.

NOTE:
Perl refers to any variable (and thus any command using that variable) containing user data as "tainted." Thus, a variable is tainted until it is run through a proper filter/validity check. We will use the term tainted throughout the chapter. Perl actually has an official "taint" mode, activated by the –T command-line switch. When activated, the Perl interpreter will abort the program when a tainted variable is used. Perl programmers should consider using this handy security feature.

Looking for Buffer Overflows

Buffer overflows are one of the top flaws for exploitation on the Internet today.A buffer overflow occurs when a particular operation/function writes more data into a variable (which is actually just a place in memory) than the variable was designed to hold.The result is that the data starts overwriting other memory locations without the computer knowing those locations have been tampered with.To make matters worse, some hardware architectures (such as Intel and Sparc) use the stack (a place in memory for variable storage) to store function return addresses.Thus, the problem is that a buffer overflow will overwrite these return addresses, and the computer—not knowing any better—will still attempt to use them. If the attacker is skilled enough to precisely control what values the return pointers are overwritten with, he can control the computer’s next operation(s).

The two flavors of buffer overflows referred to today are "stack" and "heap." Static variable storage (variables defined within a function) is referred to as "stack" because the variables are actually stored on the stack in memory. Heap data is the memory that is dynamically allocated at runtime, such as by C’s malloc() function. This data is not actually stored on the stack, but somewhere amidst a giant "heap" of temporary, disposable memory used specifically for this purpose. Actually exploiting a heap buffer overflow is much more involved, because there are no convenient frame pointers (as are on the stack) to overwrite. Luckily, however, buffer overflows are only a problem with languages that must predeclare their variable storage sizes (such as C and C++).ASP, Perl, and Python all have dynamic variable allocation— the language interpreter itself handles the variable sizes.This is rather handy, because it makes buffer overflows a moot issue (the language will increase the size of the variable if there’s too much data). However, C and C++ are still widely used languages (especially in the UNIX world), and therefore buffer overflows are not going to disappear anytime soon.

NOTE:
More information on regular buffer overflows can be found in an article by Aleph1 entitled Smashing the Stack for Fun and Profit. A copy is available online at www.insecure.org/stf/smashstack.txt. Information on heap buffer overflows can be found in the "Heap Buffer Overflow Tutorial" by Shok, available at www.w00w00.org/files/articles/heaptut.txt.


The str* Family of Functions

The str* family of functions (strcpy(), strcat(), and so on) are the most notorious— they all will copy data into a variable with no regard to the variable’s length. Typically, these functions take a source (the original data) and copy it to a destination (the variable).

In C/C++, you have to check all uses of the functions strcpy(), strcat(), strcadd(), strccpy(), streadd(), strecpy(), and strtrns(). Determine if any of the source data incorporates user-submitted data, which could be used to cause a buffer overflow. If the source data does include user-submitted data, you must ensure that the maximum length/size of the source (data) is smaller than the destination (variable) size.

If it appears that the source data is larger than the destination variable, you should then trace the exact origin of the source data to determine if the user could potentially use this to his advantage (by giving arbitrary data used to cause a buffer overflow).

The strn* Family of Functions

A safer alternative to the str* family of functions is the strn* family (strncpy(), strncat(), and so on).These are essentially the same as the str* family, except they allow you to specify a maximum length (or a number, hence the n in the function name). Properly used, these functions specify the source (data), destination (variable), and maximum number of bytes—which must be no more than the size of the destination variable! Therein lies the danger: Many people believe these functions to be foolproof against buffer overflows; however, buffer overflows are still possible if the maximum number specified is still larger than the destination variable.

In C/C++, look for the use of strncpy() and strncat().You need to check that the specified maximum value is equal to or less than the destination variable size; otherwise, the function is prone to potential overflow just like the str* family of functions discussed in the preceding section.

NOTE:
Technically, any function that allows for a maximum limit to be specified should be checked to ensure the maximum limit isn’t set higher than it should be (in effect, larger than the destination variable has allocated).

The *scanf Family of Functions

The *scanf family of functions "scans" an input source, looking to extract various variables as defined by the given format string.This leads to potential problems if the program is looking to extract a string from a piece of data, and it attempts to put the extracted string into a variable that isn’t large enough to accommodate it.

First, you should check to see if your C/C++ program uses any of the functions scanf(), sscanf(), fscanf(), vscanf(), vsscanf(), or vfscanf().

If it does, you should look at the use of each function to see if the supplied format string contains any character-based conversions (indicated by the s, c, and [ tokens). If the format specified includes character-based conversions, you need to verify that the destination variables specified are large enough to accommodate the resulting scanned data.

NOTE:
The *scanf family of functions allows for an optional maximum limit to be specified. This is given as a number between the conversion token % and the format flag. This limit functions similar to the limit found in the strn* family functions.

Other Functions Vulnerable to Buffer Overflows

Buffer overflows can also be caused in other ways, many of which are very hard to detect.The following list includes some other functions that otherwise populate a variable/memory address with data, making them susceptible to vulnerability. Some miscellaneous functions to look for in C/C++ include:
  • memcpy(), bcopy(), memccpy(), and memmove() Similar to the strn* family of functions (they copy/move source data to destination memory/variable, limited by a maximum value). Like the strn* family, you should evaluate each use to determine if the maximum value specified is larger than the destination variable/memory has allocated.
  • sprintf(), snprintf(), vsprintf(), vsnprintf(), swprintf(), and vswprintf() Allow you to compose multiple variables into a final text string.You should determine that the sum of the variable sizes (as specified by the given format) does not exceed the maximum size of the destination variable. For snprintf() and vsnprintf(), the maximum value should not be larger than the destination variable’s size.
  • gets() and fgets() Read in a string of data from various file descriptors. Both can possibly read in more data than the destination variable was allocated to hold.The fgets() function requires a maximum limit to be speci- fied; therefore, you must check that the fgets() limit is not larger than the destination variable size.
  • getc(), fgetc(), getchar(), and read() Used in a loop have a potential chance of reading in too much data if the loop does not properly stop reading in data after the maximum destination variable size is reached.You will need to analyze the logic used in controlling the total loop count to determine how many times the code loops using these functions.
Checking the Output Given to the User

Most applications will, at one point or another, display some sort of data to the user. You would think that the printing of data is a fundamentally secure operation; but alas, it is not. Particular vulnerabilities exist that have to do with how the data is printed, and what data is printed.

Format String Vulnerabilities

Format string vulnerabilities are a class of vulnerability that arises from the *printf family of functions (printf(), fprintf(), and so on).This class of functions allows you to specify a "format" in which the provided variables are converted into string format.

NOTE:
Technically, the functions described in this section are a buffer overflow attack, but we are classifying them under this category due to the popular misuse of the printf() and vprintf() functions normally used for output.

The vulnerability arises when an attacker is able to specify the value of the format string. Sometimes, this is due to programmer laziness.The proper way of printing a dynamic string value would be:
printf("%s",user_string_data); 
However, a lazy programmer may take a shortcut approach.
printf(user_string_data); 
Although this does indeed work, a fundamental problem is involved:The function is going to look for formatting commands within the supplied string.The user may supply data the function believes to be formatting/conversion commands—and via this mechanism she could cause a buffer overflow due to how those formatting/conversion commands are interpreted (actual exploitation to cause a buffer overflow is a little involved and beyond the scope of this chapter; suffice it to say that it definitely can be done and is currently being done on the Internet as we speak).

NOTE:
You can find more information on format string vulnerabilities in an analysis written by Tim Newsham, available online at http://comsec.theclerk.com/CISSP/FormatString.pdf.

Format string bugs are, again, seemingly limited to C/C++. While other languages have *printf functionality, their handling of these issues may exclude them from exploitation. For example, Perl is not vulnerable (which stems from how Perl actually handles variable storage). So, to find potential vulnerable areas in your C/C++ code, you need to look for the functions printf(), fprintf(), sprintf(), snprintf(), vprintf(), vfprintf(), vsprintf(), vsnprintf(), wsprintf(), and wprintf(). Determine if any of the listed functions have a format string containing user-supplied data. Ideally, the format string should be static (a predefined, hard-coded string); however, as long as the format string is generated and controlled internal to the program (with no user intervention), it should be safe.

Home-grown logging routines (syslog, debug, error, and so on) tend to be culprits in this area.They sometimes hide the actual avenue of vulnerability, requiring you to backtrack through function calls. Imagine the following logging routine (in C):
void log_error (char *error){ 
    char message[1024]; 
    snprintf(message,1024,"Error: %s",error); 
    fprintf(LOG_FILE,message); 
} 
Here we have fprintf() taking the message variable as the format string.This variable is composed of the static string "Error:" and the error message passed to the function. (Notice the proper use of snprintf to limit the amount of data put into the message variable; even if it’s an internal function, it’s still good practice to safeguard against potential problems.)

So, is this a problem? Well, that depends on every use of the log_error() function. So now you should go back and look at every occurrence of log_error(), evaluating the data being supplied as the parameter.

Cross-Site Scripting

Cross-site scripting (CSS) is a particular concern due to its potential to trick a user. CSS is basically due to Web applications taking user data and printing it back out to the user without filtering it. It’s possible for an attacker to send a URL with embedded client-side scripting commands; if the user clicks on this Trojaned URL, the data will be given to the Web application. If the Web application is vulnerable, it will give the data back to the client, thus exposing the client to the malicious scripting code.The problem is compounded due to the fact that the Web application may be in the user’s trusted security zone—thus the malicious scripting code is not limited to the same security restrictions normally imposed during normal Web surfing.

To avoid this, an application must explicitly filter or otherwise re-encode usersupplied data before it inserts it into output destined for the user’s Web browser. Therefore, what follows is a list of typical output functions; your job is to determine if any of the functions print out tainted data that has not been passed through some sort of HTML escaping function. An HTML escape routine will either remove any found HTML elements or encode the various HTML metacharacters (particularly replacing the "<" and ">" characters with "<" and ">" respectively) so the result will not be interpreted as valid HTML. Looking for CSS vulnerabilities is tough; the best place to start is with the common output functions used by your language:
  • C/C++ Calls to printf(), fprintf(), output streams, and so on.
  • ASP Calls to Response.Write and Response.BinaryWrite that contain user variables, and direct variable output using <%=variable%> syntax.
  • Perl Calls to print, printf, syswrite, and write that contain variables holding user-supplied data.
  • PHP Calls to print, printf, and echo that contain variables that may hold user-supplied data.
  • TCL Calls to puts that contain variables that may hold user-supplied data.
In all languages, you need to trace back to the origin of the user data and determine if the data goes through any filtering of HTML and/or scripting characters. If it doesn’t, an attacker could use your Web application for a CSS attack against another user (taking advantage of your user/customer due to your application’s insecurity).

Information Disclosure

Information disclosure is not a technical problem per se. It’s quite possible that your application may provide an attacker with an insightful piece of knowledge that could aid him in taking advantage of the application.Therefore, it’s important to review exactly what information your application makes available.

Some general things to look for in all languages include:
  • Printing sensitive information (passwords, credit card numbers) in full display Many applications do not transmit full credit card numbers; rather, they show only the last four or five digits. Passwords should be obfuscated so a bypasser cannot spot the actual password on a user’s terminal.
  • Displaying application configuration information, server configuration information, environment variables, and so on Doing so may aid an attacker in subverting your security measures. Providing concise details may help an attacker infer misconfigurations or lead him to specific vulnerabilities.
  • Revealing too much information in error messages This is a particularly sinful area. Failed database connections typically spit out connection details that include database host address, authentication details, and target tables. Failed queries can expose table layout information, such as field names and data types (or even expose the entire SQL query). Failed file inclusion may disclose file paths (virtual or real), which allows an attacker to determine the layout of the application.
  • Avoiding the use of public debugging mechanisms in production applications By "public" we mean any debugging information possibly provided to the user.Writing debugging information to a log on the application server is quite acceptable; however, none of that information should be shown to (or be accessible by) the user.
Because the actual method of information disclosure can widely vary within any language, there are no exact functions or code snippets to look for.

Checking for File System Access/Interaction

The Web is basically a graphically based file sharing protocol; the opening and reading of user-specified files is the core of what makes the Web run.Therefore, it’s not far off base for Web applications to interact with the file system as well. Essentially, you should definitively know exactly where, when, and how a Web application accesses the local file system on the server.The danger lies in using filenames that contain tainted data.

Depending on the language, file system functions may operate on a filename or a file descriptor. File descriptors are special variables that are the result of an initial function that preps a filename for use by the program (typically by opening it and returning a file descriptor, sometimes referred to as a handle). Luckily, you do not have to concern yourself with every interaction with a file descriptor; instead, you should primarily focus on functions that take filenames as parameters—especially ones that contain tainted data.

NOTE:
An entire myriad of file system–related problems exists that deal with temporary files, symlink attacks, race conditions, file permissions, and more. The breadth of these problems is quite large—particularly when considering the many available languages. However, all these problems are limited (luckily) to the local system that houses the Web application. Only attackers able to log in to that system would be able to potentially exploit those vulnerabilities. We are not going to focus on this realm of problems here, because best practice dictates using dedicated Web application servers (which don’t allow normal user access).

Specific functions that take filenames as a parameter include:
  • C/C++ Compiling a definitive list of all file system functions in C/C++ is definitely a challenge, due to the amount of external libraries and functions available.Therefore, for starters, you should look at calls to the function: open(), fopen(), creat(), mknod(), catopen(), dbm_open(), opendir(), unlink(), link(), chmod(), stat(), lstat(), mkdir(), readlink(), rename(), rmdir(), symlink(), chdir(), chroot(), utime(), truncate(), and glob().
  • ASP Calls to Server.CreateObject() that create Scripting.FileSystemObject objects. Access to the file system is controlled via the use of the Scripting.FileSystemObject; so if the application doesn’t use this object, you don’t have to worry about file system vulnerabilities.The MapPath function is typically used in conjunction with file system access, and thus serves as a good indicator that the ASP page does somehow interact with the file system on some level.
    • Uses of the ChooseContent method of an IISSample.ContentRotator object (look for Server.CreateObject() calls for IISSample.ContentRotator).
  • Perl Calls to the functions chmod, chown, link, lstat, mkdir, readlink, rename, rmdir, stat, symlink, truncate, unlink, utime, chdir, chroot, dbmopen, open, sysopen, opendir, and glob.
    • Look for uses of the IO::* and File::* modules; each of these modules provides (numerous) ways to interact with the file system and should be closely observed (you can quickly find uses of module functions by searching for the IO:: and File:: prefix).
NOTE:
Technically, it’s possible to import module functions into your own namespace in Perl and Python; this means that the module:: (as in Perl) and module. (as in Python) prefixes may not necessarily be used.

  • PHP Calls to the functions opendir(), chdir(), dir(), chgrp(), chmod(), chown(), copy(), file(), fopen(), get_meta_tags(), link(), mkdir(), readfile(), rename(), rmdir(), symlink(), unlink(), gzfile(), gzopen(), readgz- file(), fdf_add_template(), fdf_open(), and fdf_save().
    • One interesting thing to keep in mind is that PHP’s fopen has what is referred to as a "fopen URL wrapper."This allows you to open a "file" contained on another site by using the command such as fopen("http://www.neohapsis. com/","r").This compounds the problem because an attacker can trick your application into opening a file contained on another server (and thus, probably controlled by him).
  • Python Calls to the open function.
    • If the os module is imported, you need to look for the functions os.chdir, os.chmod, os.chown, os.link, os.listdir, os.mkdir, os.mkfifo, os.remove, os.rename, os.rmdir, os.symlink, os.unlink, and os.utime.
NOTE:
The os module functions may also be available if the posix module is imported, possibly using a posix.* prefix instead of os.*. The posix module actually implements many of the functions, but we recommend that you use the os module’s interface and not call the posix functions directly.
  • Java Check to see if the application imports any of the following packages: java.io.*, java.util.zip.*, or java.util.jar. If so, the application can possibly use one of the file streams contained in the package for interacting with a file. Luckily, however, all file usage depends on the File class contained in java.io.Therefore, you really only need to look for the creation of new File classes (File variable = new File ...)
    • The File class itself has many methods that need to be checked: mkdir, renameTo.
  • TCL Check all uses of the file* commands (which will appear as two words, file operation, where the operation will be a specific file operation, such as rename).
    • Uses of the glob and open functions.
  • JSP Use of the <%@include file=’filename’%> statement. However, the file inclusion specified happens at compile time, which means the filename cannot be altered by user data. However, keeping tabs on what files are being included in your application is wise.
    • Use of the jsp:forward and jsp:include tags. Both load other files/pages for continued processing and accept dynamic filenames.
  • SSI Uses of the <!—#include file=""—> (or <!—#include virtual=""— >) tags.
  • ColdFusion Uses of the CFFile and CFInclude tags.
Checking External Program and Code Execution

Hopefully, all the logic and functionality will stay within your application and your programming language’s core functions. However, with the greater push toward modular code over the last number of years, oftentimes your program will make use of other programs and functions not contained within it.This is not necessarily a bad thing, because a programmer should definitely not reinvent the wheel (introducing potential security problems in the process). However, how your program interacts with external applications is an important question that must be answered, especially if that interaction involves the user to some degree.

Calling External Programs

All calls to external programs should be evaluated to determine exactly what they are calling. If tainted user data is included within the call, it may be possible for an attacker to trick the command processor into executing additional commands (perhaps by including shell metacharacters), or changing the intended command (by adding additional command-line parameters).This is an age-old problem with Web CGI scripts it seems; the first CGI scripts called external UNIX programs to do their work, passing user-supplied data to them as parameters. It wasn’t long before attackers realized they could manipulate the parameters to execute other UNIX programs in the process.

Various things to look for include:
  • C/C++ The exec* family of functions (exec(), execv(), execve(), and so on) control.
  • Perl Review all calls to system, exec, `` (backticks), qx//, and <> (the globbing function).
    • The open call supports what’s known as "magic" open, allowing external programs to be executed if the filename parameter begins or ends with a pipe ("|") character.You’ll need to check every open call to see if a pipe is used, or more importantly, if it’s possible that tainted data passed to the open call contain the pipe character.There are also various open command functions contained in the Shell, IPC::Open2, and IPC::Open3 modules.You will need to trace the use of these module’s functions if your program imports them.
  • TCL Calls to the exec command.
  • PHP Calls to fopen() and popen().
  • Python Check to see if the os (or posix) module is loaded. If so, you should check each use of the os.exec* family of functions: os.exec, os.execve, os.execle, os.execlp, os.execvp, and os.execvpe. Also check for os.popen and os.system (or possibly posix.popen and posix.system).
    • You should be wary of functionality available in the rexec module; if this module is imported, you should carefully review all uses of rexec.* commands.
  • SSI Use of the <!—#exec command=""—> tag.
  • Java Check to see if the java.lang package is imported. If so, check for uses of Runtime.exec().
  • PHP Calls to the functions exec(), passthru(), and system().
  • ColdFusion Use of the CFExecute and CFServlet tag.
Dynamic Code Execution

Many languages (especially the scripting languages, such as Perl, Python,TCL, and so on) contain mechanisms to interpret and run native scripting code. For example, a Python script can take raw Python code and execute it via the compile command. This allows the program to "build" a subprogram dynamically or allow the user to input scripting code (fragments). However, the scary part is that the subprogram has all the privileges and functionality of the main program—if a user can insert his own script code to be compiled and executed, he can effectively take control of the program (limited only by the capabilities of the scripting language being used).This vulnerability is typically limited to script-based languages.

The various commands that cause code compilation/execution include:
  • TCL Uses of the eval and expr commands.
  • Perl Uses of the eval function and do, and any regex operation with the e modifier.
  • Python Uses of the commands exec, compile, eval, execfile, and input.
  • ASP Certain ASP interpreters may have Eval, Execute, and ExecuteGlobal available.
External Objects/Libraries

Besides the dynamic generation and compilation of program code (discussed earlier), a program can also choose to load or include a collection of code (commonly referred to as a library) that is external to the program.These libraries typically include common functions helpful in making the design of a program easier, specialty functions meant to perform or aid in specific operations, or custom collections of functions used to support your Web application. Regardless of what functions a library may contain, you have to ensure the program loads the exact library intended. An attacker may be able to coerce your program into loading an alternate library, which could provide him an advantage. When you review your source code, you must ensure that all external library loading routines do not use any sort of tainted data.

NOTE:
External library vulnerabilities are technically the same as the file system interaction vulnerabilities discussed previously. However, external libraries have a few associated nuances (particularly in the methods/functions used to include them) that warrant them being a separate problem area.

The following is a quick list of functions used by the various languages to import external modules. In all cases, you should review the actual modules being imported, checking to see if it’s possible for a user to modify the importation process (via tainted data in the module name, for example).
  • Perl import, require, use, and do
  • Python import and __import__
  • ASP Server.CreateObject(), and the <OBJECT runat="server"> tag when found in global.asa
  • JSP jsp:useBean
  • Java URLClassLoader and JarURLConnection from the java.net package; ClassLoader, Runtime.load, Runtime.loadLibrary, System.load, and System.loadLibrary from the java.lang package
  • TCL load, source, and package require
  • ColdFusion CFObject
Checking Structured Query Language (SQL)/Database Queries

This is a more recent emerging area of vulnerability specifically due to the growing use of databases in conjunction with Web applications. Obviously, databases make for great central repositories for storing, parsing, and retrieving a variety of information. The largest area of vulnerability lies in the use of the database SQL, which is a standard, human-oriented query language used to perform operations on a database.The specific vulnerability has to do with SQL being human-oriented, or better put, being natural-language oriented.This means that an actual SQL query is designed to be readable and understandable by humans, and that computers must first parse and figure out exactly what the query was intended to do. Due to the nature of this approach, an attacker may be able to modify the intent of the human-readable SQL language, which in turn results in the database believing the query has a completely different meaning.

NOTE:
The exact level of risk associated with SQL-related vulnerabilities is directly dependant on the particular database software you use and the features that software provides.

But this isn’t the only SQL/database vulnerability.The significant areas of vulnerability fall into one of two types:
  • Connection setup
  • Tampering with queries
During the setup of connections with a database, you need to look at the application and determine where the application initially connects to the database. Typically, a connection is made before queries can be run.The connection usually contains authentication information: username, password, database server, table name, and so on.This authentication information should be considered sensitive, and therefore the application should be examined on how it stores this information prior, during, and after use (upon connecting to the database). Of course, none of the authentication information used during connection setup should contain tainted data; otherwise, the tainted data needs to be analyzed to determine if a user could potentially supply or alter the credentials used to establish a connection to the database server. As discussed in Chapter 4,"Vulnerable CGI Scripts," when we talked about SQL Injection, tampering with queries is a common vulnerability.The dynamic nature of Web applications dictates that they somehow dynamically process a user’s request. Databases allow the program (on behalf of the user) to query for a particular set of data within the supplied parameters, and/or to store the resulting data into the database for later use.The biggest problem is that this involves actually inserting the tainted data into the query itself in some form or another. An attacker may be able to submit data that, when inserted into a SQL query, will trick the SQL/database server into executing different queries than the one intended.This could allow an attacker to tamper with the data contained in the database, view more data than was intended to be viewed (particularly records of other users), and bypass authentication mechanisms that use user credentials stored in a database.

Given the two problem areas, the following list of functions/commands will lead you to potential problems:
  • C/C++ Unfortunately, no "standard" library exists for accessing various external databases.Therefore, you will have to do a little legwork on your own and determine what function(s) are used to establish a connection to the database and what function(s) are used to prepare/perform a query on the database. After that’s determined, you just search for all uses of those target functions.
  • PHP Calls to the functions ifx_connect(), ifx_pconnect(), ifx_prepare(), ifx_query(), msql_connect(), msql_pconnect(), msql_db_query(), msql_query(), mysql_connect(), mysql_db_query(), mysql_pconnect(), mysql_query(), odbc_connect(), odbc_exec(), odbc_pconnect(), odbc_prepare(), ora_logon(), ora_open(), ora_parse(), ora_plogon(), OCILogon(), OCIParse(), OCIPLogon(), pg_connect(), pg_exec(), pg_pconnect(), sybase_connect(), sybase_pconnect(), and sybase_query().
  • ASP Database connectivity is handled by the ADODB.* objects.This means that if your script doesn’t create an ADODB.Connection or ADODB.Recordset object via the Server.CreateObject function, you don’t have to worry about your script containing ADO vulnerabilities. If your script does create ADODB objects, you need to look at the Open methods of the created objects.
  • Java Java uses the JDBC (Java DataBase Connectivity) interface stored in the java.sql module. If your application uses the java.sql module, you need to look at the uses of the createStatement() and execute() methods.
  • Perl Perl can use the generic database-independent DBI module, or the database-specific DB::* modules.The functions exported by each module widely vary, so you should determine which (if any) of the modules are loaded and find the appropriate functions.
  • Cold Fusion The CFInsert, CFQuery, and CFUpdate tags handle interactions with the database.
Checking Networking and Communication Streams

Checking all outgoing and incoming network connections and communication streams used by a program is important. For example, your program may make an FTP connection to a particular server to retrieve a file. Depending on where tainted data is included, an attacker could modify which FTP server your program connects to, what user credentials are presented, or which file is retrieved. It’s also very important to know if the Web application sets up any listening server processes that answer incoming network connections. Incoming network connections pose many problems, because any vulnerability in the code controlling the listening service could potentially allow a remote attacker to compromise the server.Worse, custom network services, or services run in conjunction with unusual port assignments, may subvert any intrusion detection or other attack-alert systems you may have set up to monitor for attackers.

What follows is a list of various functions that allow your program to establish or use network/communication streams:
  • Perl and C/C++ Uses of the connect command indicate the application is making outbound network connections."Connect" is a common name that may be found in other languages as well.
    • Uses of the accept command means the application is potentially listening for inbound network connections. Accept is also a common name that may be found in other languages.
  • PHP Uses of the functions imap_open, imap_popen, ldap_connect, ldap_add, mcal_open, fsockopen, pfsockopen, ftp_connect, and ftp_login, mail.
  • Python Uses of the socket.*, urllib.*, and ftplib.* modules.
  • ASP Use of the Collaborative Data Objects (CDO) CDONTS.* objects; in particular, watch for CDONTS.Attachment, CDONTS.NewMail AttachFile, and AttachURL. An attacker might be able to trick your application into attaching a file you don’t want to be sent out.This is similar to the file system-based vulnerabilities described earlier.
  • Java The inclusion of the java.net.* package(s), and especially for the use of ServerSocket (which means your application is listening for inbound requests). Also, keep a watch for the inclusion of java.rmi.*. RMI is Java’s remote method invocation, which is functionally similar to CORBA’s.
  • ColdFusion Look for the tags CFFTP, CFHTTP, CFLDAP, CFMail, and CFPOP.

PUTTING IT ALL TOGETHER

So, now that you have this large list of target functions/commands, how do you begin to look for them in a program? Well, the answer varies slightly, depending on your resources. On the simple side, you can use any editor or program with a builtin search/find function (even a word processor will do). Just search for each listed function, taking note of where it is used by the application and in what context. Programs that can search multiple files at one time (such as UNIX grep) are much more efficient—however, command-line utilities such as grep don’t let you interactively scroll through the program.We enjoy the use of the GNU less program, which allows you to view a file (or many files). It even has built-in search capability.

Windows users could use the DOS find command;Windows users may also want to investigate the use of a shareware programming code editor by the name of UltraEdit. UltraEdit (www.ultraedit.com) allows the visual editing of files and searching within a file or across multiple files. If you are really hard-pressed for searching multiple files on Windows, you can technically use the Windows Find Files feature, which allows you to search a set of files for a specified string. As we mentioned earlier in this chapter, SourceEdit from Brixoft (www.brixoft.net) can also be used to review source code in numerous languages. On the extreme end, uses of code and data modeling tools might point out subtle logic flaws and loops that are otherwise hard to notice by normal review. Whichever tool you use, however, ultimately the person who is performing the audit is best able to determine major issues in the code.


SUMMARY

Making sure your Web applications are secure is a due-diligence issue many administrators and programmers should undoubtedly perform—but lacking the expertise and time to do so is sometimes an overriding factor.Therefore, it’s important to promote a simple method of secure code review anyone can tackle. Looking for specific problem areas and then tracing the program execution in reverse provides an effi- cient and manageable approach for wading through large amounts of code. By focusing on high-risk areas (buffer overflows, user output, file system interaction, external programs, and database connectivity), you can easily remove a vast number of common mistakes plaguing many Web applications found on the Net today.


SOLUTIONS FAST TRACK

How to Efficiently Trace through a Program
  • Tracing a program’s execution from start to finish is too time intensive.
  • You can save time by instead going directly to problem areas.
  • This approach allows you to skip benign application processing/ calculation logic.
Auditing and Reviewing Selected Programming Languages
  • Uses of popular and mature programming language can help you audit the code.
  • Certain programming languages may have features that aid you in efficiently reviewing the code.
Looking for Vulnerabilities
  • Review how user data is collected.
  • Check for buffer overflows.
  • Analyze program output.
  • Review file system interaction.
  • Audit external component use.
  • Examine database queries and connections.
  • Track use of network communications.
Pulling It All Together
  • Use tools such as UNIX grep, GNU less, the DOS find command, UltraEdit, or SourceEdit to look for the functions previously listed.

FREQUENTLY ASKED QUESTIONS

The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form.

Q: This is tedious. Do any automated tools do this work?

A: Due to the custom and dynamic nature of source code, it’s very hard to design a tool that is capable of understanding what the developer intended and how an attacker might subvert that.Tools such as SourceEdit help highlight some problem areas—but these tools are far from becoming an automated replacement.

Q: Will outside companies check our source code for us?

A: We suggest you check SecurityFocus.com. SecurityFocus.com maintains a multivendor security service offerings directory, which includes a list of companies that perform formal code audits.

Q: Where can I find information online about potential threats and how to defend against them?

A: Lincoln Stein has written the Web Security FAQ, available online at www.w3.org/Security/Faq/www-security-faq.html.There is also the Secure Programming for Linux and UNIX HOWTO (which includes C/C++, Java,TCL, Python, and Perl) available at www.dwheeler.com/secure-programs.

Q: Where’s the best place to find out more information regarding secure coding in my particular language?

A: The vendor of the particular programming language is definitely the best place to start. However, some languages (such as C/C++,TCL, and so on) don’t have official "vendors"—but many support sites exist. For example, perl.com features a wealth of information for Perl programmers.



Page: 1



ADS BY GOOGLE SPONSORED LINKS FEATURED LINKS

Maximize your SharePoint Investment – 8 Cities
Discover best practices and tips for both architecting and administering SharePoint. Early Bird Price of $99 through Sept 15th.

Find a new job now on the all new IT Job Hound!
Search jobs, post your resume, and set up job e-mail alerts!

Master SharePoint with 3 eLearning Seminars
Learn how to build a better SharePoint infrastructure and enable powerful collaboration with MVPs Dan Holme and Michael Noel. Register today!

Top Tools for Virtualization Disaster Recovery & Replication
View this web seminar on August 14th to learn about two tools that will result in faster backup and restore with P2V disaster recovery.

SharePointConnections Conference Fall 2008
Don’t miss the premier event for Microsoft IT Professionals in Las Vegas, November 10-13. Register and book your room by August 25 and receive a FREE room night (based on a three night minimum stay).

VMworld 2008 - Sign Up Today!
Join your peers on September 15-18 at The Venetian Hotel in Las Vegas as VMware hosts VMworld 2008, the leading Virtualization event.



Entrust Unified Communications Certs
Secure Exchange 2007 and save 20%. Now through Sept. 2008.

Increase Application Performance
Free White Paper by Editor's Best winner, Texas Memory Systems.

Need to convert between XML, DBs, EDI, and Excel? Try MapForce free!
Drag & drop to transform between popular data formats – get results instantly or generate code.

Microsoft® Tech•Ed EMEA 2008 IT Professionals
Advance your thinking with new ideas and practical real-world solutions at Microsoft’s FIVE day technical infrastructure conference 3-7 Nov., 2008. Register before 26 September 2008 to save €300.

Order Your SQL Fundamentals CD Today!
Learn how to use SQL Server, understand Office integration techniques and dive into the essentials of SQL Express and Visual Basic with this free SQL Fundamentals CD.

Are You Really Compliant with Software Regulations?
View this web seminar that will help you with compliance best practices and check out a management solution to assure that you won’t be in jeopardy of an audit.

Virtualization Congress Oct. 14-16 in London
Don't miss Virtualization Congress, the premiere EMEA conference dedicated to hardware, OS and application virtualization. Oct. 14-16.
Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technical Resources Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing