The goal of this chapter is for any computer-literate individual to be able
to take an already-developed piece of code and determine if it has fundamental
security problems.We provide you with a detailed list of problem areas pertaining to
various popular programming languages, and show you how to use such a list in
assessing the source code of a Web application.
Designing a program from scratch allows you to incorporate security from the
beginning, or at least be familiar enough with the program to rationalize potential
vulnerable areas in the code. However, as an administrator or developer, you may
face various alternate situations:You may have joined a development project already
in progress, thus inheriting someone else’s code. Or you have made the decision to
use third-party code (such as an open source library or CGI application). Or, as an
administrator, you’re worried about the quality of code your internal developers are
putting on your system.
In all these situations, it really helps to be able to quickly and efficiently review
the code for problems.You don’t have to be a programmer extraordinaire to perform
a basic code review; and even if you can’t follow some of the specific programming
nuances, you can at least raise red flags for later review by a more knowledgeable
individual.The goal of this chapter is for any computer-literate individual to be able
to take an already-developed piece of code and determine if it has fundamental
security problems.We provide you with a detailed list of problem areas pertaining to
various popular programming languages, and show you how to use such a list in
assessing the source code of a Web application. First, we look at how to efficiently
trace through a program, effectively giving you a game plan on where to start.Then,
we overview some particularly popular programming languages used for Web application
programming, followed by a long list of problem areas and the details associated
with each language.
HOW TO EFFICIENTLY TRACE THROUGH A PROGRAM
Let’s face it:There are not enough hours in the day for some things. Spending a few
days reviewing piles of source code looking for potential security problems is defi-
nitely inefficient, not to mention time consuming (unless you’re being paid to do it).
If it’s a small program with a linear logic flow (that is, the program isn’t highly interactive
nor does it contain a lot of branching logic), the task may not be that hard;
however, if the program is of moderate size, reviewing it can be a headache.This
headache is compounded if the source code is distributed among multiple components,
contained in multiple files. Starting at the beginning of the program and then
stepping through every possible execution path becomes nearly impossible.
This chapter illustrates a different technique for approaching source code
reviews. Rather than trace the program forward through execution, we take the
reverse approach: proceed directly to the potential problem areas, and then trace back
through the program to confirm whether they are vulnerable.Technically, we’re only
interested in the execution paths that involve the user; however, trying to follow
those paths can be excruciating because data supplied by a user can go every which
way after the program starts processing it. So instead, we start at the end and then
trace the flow in reverse to see if we encounter a user path.Thus, the emphasis is
really in looking for vulnerabilities that involve user-supplied data in some way,
shape, or form.
NOTE: When reviewing code, we don’t need to bother looking at areas where
the program internally generates the data, because we assume the program
will not try to exploit itself.
|
The logic behind this approach is simple and best illustrated with an example.
Say you had a program that queried the user for a set of particular numeric values.
The program then proceeded to perform a large (possibly superfluous) amount of
calculations on those values, incorporating values submitted from other users (pulled
from a database), calculating and correlating various trends, and finally storing the
results in a database record.
Now, the code to perform those calculations may be complex, intense, and
exhaustive to try to step through. However, from a security standpoint, it’s easy:We
can, for the most part, ignore it. We’re not here to make sure the program works as
intended; we’re here to find potential vulnerabilities.Taking that example, we can
narrow it down to three potential problem areas:
- Initial data supplied by the user (and its validity)
- Reading of additional values from the database during the processing
- Storing of the final result into the database
The values supplied by the user should be initially checked to see if they are
valid data types (in this case, they are all numeric). Looking at the point of data entry
(when the data is received from the user) will determine this.
The intermediary values read from the database must be done safely. Looking
specifically at the SQL/database queries made lets you see if they (potentially) use
any user-supplied data in the actual query; if they don’t, they can be considered
"controlled," and thus safe.
Tools & Traps... Fill Your Toolbox
|
The grep command-line tool is extremely useful. grep is a UNIX-originated tool
used to search files (particularly text files) for particular strings of text. It will
output the actual context where the specified string was found, associated line
numbers, surrounding lines on text, and so on. You can also tell grep to search
multiple files. This makes grep a useful, albeit simplistic, tool to use. Because grep
has many different implementations, we recommend using the GNU grepit’s
free and packed full of useful features/options. grep has versions compiled for
the Windows platform as well (although the "find" command shipped with
Windows provides the same general functionality). It is available for download
from www.gnu.org/software/grep/.
Other tools to review source code can readily be found on the Internet. A
popular tool is SourceEdit from Brixoft (www.brixoft.net). SourceEdit allows you
to review source code for the most common programming languages (C/C++,
C#, Visual Basic, Pascal, Java, ASP, PHP, Perl, Cold Fusion, SQL, HTML, CSS, and
XML). If you want to review code that isn’t natively supported by SourceEdit, you
can either install language files or create new ones using its Language Editor. It
also includes a wide range of useful features, including code completion, function
list, a hex editor, and other custom tools.
|
Storing the result should be done in a secure manner.This is a matter of looking
at the construction of the SQL/database query used to store the result. As long as
the result is properly controlled and filtered, the database update can be considered
safe. And thus, we have just given a brief security code review to the application,
without having to actually deal with all that complex application calculation logic.
Now obviously this method isn’t foolproof; however, the method still stands as an
efficient means for individuals who are not programming savvy.
As with any code review, this approach assumes you have all the source available
for the application in question.There are times when an application may use
external libraries or componentsif you don’t have the source to these components,
you are limited to two options: meticulously inspecting all data given to and received
from the external library/program (reducing the potential for problems within
external portion), or blindly trusting it. Which route you choose depends on the circumstances.
You can probably trust system libraries, but be suspicious of other thirdwww.
party code. When in doubt, go with your instincts. If your instincts are failing you,
then be paranoid instead and don’t trust ityou can never be too cautious.
In this approach, we will also be focusing on a programmatic approachthat is,
we will focus on the actual (mis)uses of certain functions and the programming language
in general.We do not focus on logic-based security flaws, because they require
the expertise of knowing exactly what a program is attempting to do, how it is doing
such logic, where it is making assumptions, and where it might fail. And of course, all
of those items vary from one application to the next, because they are dependant on
how the application was coded in the first place.Any programmer could take an infi-
nite number of directions to solve a problemand attempting to make a security
checklist of where each method contains problems (logically) is a definite task in
futility. If you must tend to such areas, we recommend a review by a professional
security reviewer skilled in the programming language of your application.
AUDITING AND REVIEWING SELECTED PROGRAMMING LANGUAGES
Many programming languages are available on the market today. Due to the explosion
of Web application development, there even happen to be a few Web-centric
ones. Choosing the right language is a black art; each has its pros and cons when it
comes to being used for Web applications.This chapter doesn’t care about the actual
usefulness and appropriateness of each language; instead, we concern ourselves only
with aspects that relate to efficient code auditing.
Java
Java code can come in many flavors: self-contained applications, mobile applets,
beans, or even scriptable via Java Server Pages (JSP) and JavaScript. From this point
on, when we refer to "Java," we are referring to a bytecode compiled application,
applet, or bean; JavaScript and JSP will be considered separate (due to the characteristics
of what you would look for).
The "core" Java language basically consists of logic control statements and
class/package manipulation routines.The actual functionality is contained in various
external packages and classes, which are imported when needed.This aspect provides
a useful benefit to you as a reviewer: if the package/class is not imported or otherwise
loaded, you don’t have to worry about any potential security problems associated
with items in that package/class. For example, you don’t have to check for
file-related vulnerabilities if the java.io package(s) are not imported.You can find
more information on Java in Chapter 7,"Securing Your Java Code."
Java Server Pages
Java Server Pages ( JSP), as mentioned earlier, are a scriptable version of Java that can
be embedded inline within the appropriate HTML document. JSP also has hooks to
interface with other server-side Java applets and beans.The JSP language itself is
fairly limited, serving more as "glue" between HTML and server-side Java applications.
However, in the seemingly Java-crazed world we currently live in (which has
nothing to do with the proliferation of Starbucks coffee shops), JSP has become the
latest rage.
Active Server Pages
In the Microsoft world, the actual scripting language behind Active Server Pages
(ASP) is VBScript. However, there are various third-party ASP emulators like Sun
Java System Active Server Pages (formerly Sun ONE andChili!ASP) that technically
are not VBScript; therefore, we refer to the language simply as ASP.
ASP is a Visual Basic/VBScript derivative with a structure similar to Javathat
is, the basic language implements logic control statements, and all other functionality
is contained in external objects.This allows you to selectively look for vulnerability
areas based on what objects are being used by the code (like Java). Keep in mind that
to ease programmability, the Application, ObjectContext, Request, Response, Server,
and Session objects are automatically available in every script (that is, they do not
have to be imported).
Server Side Includes
Server Side Includes (SSI) were the ancestor of embedded inline server-side application
languages. SSI basically provides the simple functionality to include external
files, execute programs, and display variable contents within an HTML file.ASP
actually incorporates SSI functionality automaticallythis needs to be kept in mind
when auditing ASP Web applications.
SSI commands follow the simple format of <!#command options>, where
command would be the SSI operation (such as include, exec, and so on), and
options are various values that determine what the command is supposed to do.
Python
Python is a flexible object-oriented scripting language. Although the core Python
interpreter implements basic functionality and logic control, many functions are contained
in external modules, which have to be explicitly imported. Again, like Java
and ASP, this allows you to more efficiently audit the source code based on which
modules are imported.
The Tool Command Language
The Tool Command Language (Tcl) scripting language uses a natural language
syntax, which makes coding scripts more intuitive and easy to read. Although Tcl
(pronounced tickle) is typically used with its graphical counterpartthe associated
toolkit called Tk–Tcl has been used by Web programmers for online Web CGIs. Also
similar to various previously mentioned languages,Tcl imports various functionalities
from external modules.
Practical Extraction and Reporting Language
Practical Extraction and Reporting Language (Perl) is a scripting language originally
implemented on UNIX platforms. In the past, it was a popular language to use for
CGI applications; however, the newer embedded scripting languages such as ASP, JSP,
ColdFusion, and PHP have definitely encroached on its reign.To make up for this,
newer offshoot Perl projects actually embed Perl into Apache (via mod_perl) and IIS
(via a Perl plug-in).
Perl implements a lot of functionality within the core language; however, Perl is
extensible via external modules. Although you could be selective on what you audit
based on imported modules, there is enough risk in the core language’s functionality
that makes it imperative that you check for all problem areas.
PHP: Hypertext Preprocessor
PHP (PHP: Hypertext Preprocessor) is a server scripting language popular on the
UNIX platform, which has also become popular on Windows systems. PHP commands
are embedded inline similar to ASP and JSP. PHP doesn’t use dynamicloading
modules; instead, all modules are included at the time the PHP engine is
compiled.This means that all functions are available at the application’s runtime,
forcing you to look for the entire breadth of vulnerable functions (you can’t take
shortcuts based on imported packages and modules, as in Java and ASP).
C/C++
C is the classic "workhorse" language, with its more modern object-oriented C++
derivative. The most recent variation of this language is C#, which Microsoft
released as the third generation of the C language. C and C++ are very powerful
languages, allowing low-level system access in many places. However, this power
comes at a priceC and C++ can be quite complex and ruthless.You have to
meticulously make sure everything is allocated, of the right size, and deallocated
when finished; no automatic variable expansion or garbage collection exists to make
your life easier.
NOTE: Technically, various C++ classes do handle automatic variable expansion
(making the variable larger when there’s too much data to put it in) and
garbage collection. However, such classes are not standard and widely
vary in features. C does not use such classes.
|
C/C++ can prove mighty challenging for you to thoroughly audit, due to the
extensive control an application has and the amount of things that could potentially
go wrong. Our best advice is to take a deep breath and plow forth, tackling as much
as you can in the process.
ColdFusion
ColdFusion is an inline HTML embedded scripting language by Allaire. Similar to
JSP, ColdFusion scripting looks much like HTML tagstherefore, you need to be
careful you don’t overlook anything nestled away inside what appears to be benign
HTML markup. ColdFusion is a highly database-centric languageits core functionality
is mostly comprised of database access, formatted record output, and light
string manipulation and calculation. However, ColdFusion is extensible via various
means (Java beans, external programs, objects, and so on), so you must always keep
tabs on what external functionality ColdFusion scripts may be using.You can find
more information on ColdFusion in Chapter 10,"Securing ColdFusion."
LOOKING FOR VULNERABILITIES
What follows is a collection of problem areas and the specific ways you can look for
them.The majority of the problem areas all are based on a single principle: use of a
function that interacts with user-supplied data. Realistically, you will want to look at
every such functionbut doing so may require too much time.Therefore, we have
compiled a list of the "higher risk" functions with which remote attackers have been
known to take advantage of Web applications.
Because the attacker will masquerade as a user, we only need to look at areas in
the code that are influenced by the user. However, you also have to consider other
untrusted sources of input into your program that influence program execution:
external databases, third-party input, stored session data, and so on.You must consider
that another poorly coded application may insert tainted SQL data into a
database, which your application would be unfortunate enough to read and potentially
be vulnerable to.
Getting the Data from the User
Before we start tracing problems in reverse, the first (and most important, in our
opinion) step is to zoom directly to the section of code that accepts the user’s data.
Hopefully, all data collection from the user is centralized in one spot; instead, however,
bits and pieces may be received from the user as the application progresses (typical
of interactive applications). Centralizing all user data input into one section (or a
single routine) serves two important functions: it allows you to see exactly what
pieces of data are accepted from a user and what variables the program puts them in,
and allows you to centrally filter incoming user data for illegal values.
For any language, first check to see if any of the incoming user data is put
through any type of filtering or sanity checks. Hopefully, all data input is done at a
central location, with the filtering/checking done immediately thereafter.The more
fragmented an application’s approach to filtering becomes, the more chances a variable
containing user data will be left out of the filtering mechanism(s). Also, knowing
ahead of time which variables contain user-supplied data simplifies following the
flow of user data through a program.
NOTE: Perl refers to any variable (and thus any command using that variable)
containing user data as "tainted." Thus, a variable is tainted until it is
run through a proper filter/validity check. We will use the term tainted
throughout the chapter. Perl actually has an official "taint" mode, activated
by the –T command-line switch. When activated, the Perl interpreter
will abort the program when a tainted variable is used. Perl
programmers should consider using this handy security feature.
|
Looking for Buffer Overflows
Buffer overflows are one of the top flaws for exploitation on the Internet today.A
buffer overflow occurs when a particular operation/function writes more data into a
variable (which is actually just a place in memory) than the variable was designed to
hold.The result is that the data starts overwriting other memory locations without
the computer knowing those locations have been tampered with.To make matters
worse, some hardware architectures (such as Intel and Sparc) use the stack (a place in
memory for variable storage) to store function return addresses.Thus, the problem is
that a buffer overflow will overwrite these return addresses, and the computernot
knowing any betterwill still attempt to use them. If the attacker is skilled enough
to precisely control what values the return pointers are overwritten with, he can
control the computer’s next operation(s).
The two flavors of buffer overflows referred to today are "stack" and "heap."
Static variable storage (variables defined within a function) is referred to as "stack"
because the variables are actually stored on the stack in memory. Heap data is the
memory that is dynamically allocated at runtime, such as by C’s malloc() function.
This data is not actually stored on the stack, but somewhere amidst a giant "heap" of
temporary, disposable memory used specifically for this purpose. Actually exploiting
a heap buffer overflow is much more involved, because there are no convenient
frame pointers (as are on the stack) to overwrite. Luckily, however, buffer overflows
are only a problem with languages that must predeclare their variable storage sizes
(such as C and C++).ASP, Perl, and Python all have dynamic variable allocation
the language interpreter itself handles the variable sizes.This is rather handy, because
it makes buffer overflows a moot issue (the language will increase the size of the
variable if there’s too much data). However, C and C++ are still widely used languages
(especially in the UNIX world), and therefore buffer overflows are not going
to disappear anytime soon.
NOTE: More information on regular buffer overflows can be found in an article
by Aleph1 entitled Smashing the Stack for Fun and Profit. A copy is available
online at www.insecure.org/stf/smashstack.txt. Information on heap
buffer overflows can be found in the "Heap Buffer Overflow Tutorial" by
Shok, available at www.w00w00.org/files/articles/heaptut.txt.
|
The str* Family of Functions
The str* family of functions (strcpy(), strcat(), and so on) are the most notorious
they all will copy data into a variable with no regard to the variable’s length.
Typically, these functions take a source (the original data) and copy it to a destination
(the variable).
In C/C++, you have to check all uses of the functions strcpy(), strcat(), strcadd(),
strccpy(), streadd(), strecpy(), and strtrns(). Determine if any of the
source data incorporates user-submitted data, which could be used to cause a buffer
overflow. If the source data does include user-submitted data, you must ensure that
the maximum length/size of the source (data) is smaller than the destination (variable)
size.
If it appears that the source data is larger than the destination variable, you
should then trace the exact origin of the source data to determine if the user could
potentially use this to his advantage (by giving arbitrary data used to cause a buffer
overflow).
The strn* Family of Functions
A safer alternative to the str* family of functions is the strn* family (strncpy(),
strncat(), and so on).These are essentially the same as the str* family, except they
allow you to specify a maximum length (or a number, hence the n in the function
name). Properly used, these functions specify the source (data), destination (variable),
and maximum number of byteswhich must be no more than the size of the destination
variable! Therein lies the danger: Many people believe these functions to be
foolproof against buffer overflows; however, buffer overflows are still possible if the
maximum number specified is still larger than the destination variable.
In C/C++, look for the use of strncpy() and strncat().You need to check that
the specified maximum value is equal to or less than the destination variable size;
otherwise, the function is prone to potential overflow just like the str* family of
functions discussed in the preceding section.
NOTE: Technically, any function that allows for a maximum limit to be specified
should be checked to ensure the maximum limit isn’t set higher than it
should be (in effect, larger than the destination variable has allocated).
|
The *scanf Family of Functions
The *scanf family of functions "scans" an input source, looking to extract various
variables as defined by the given format string.This leads to potential problems if the
program is looking to extract a string from a piece of data, and it attempts to put the
extracted string into a variable that isn’t large enough to accommodate it.
First, you should check to see if your C/C++ program uses any of the functions
scanf(), sscanf(), fscanf(), vscanf(), vsscanf(), or vfscanf().
If it does, you should look at the use of each function to see if the supplied
format string contains any character-based conversions (indicated by the s, c, and [
tokens). If the format specified includes character-based conversions, you need to
verify that the destination variables specified are large enough to accommodate the
resulting scanned data.
NOTE: The *scanf family of functions allows for an optional maximum limit to
be specified. This is given as a number between the conversion token %
and the format flag. This limit functions similar to the limit found in the
strn* family functions.
|
Other Functions Vulnerable to Buffer Overflows
Buffer overflows can also be caused in other ways, many of which are very hard to
detect.The following list includes some other functions that otherwise populate a
variable/memory address with data, making them susceptible to vulnerability. Some
miscellaneous functions to look for in C/C++ include:
- memcpy(), bcopy(), memccpy(), and memmove() Similar to the
strn* family of functions (they copy/move source data to destination
memory/variable, limited by a maximum value). Like the strn* family, you
should evaluate each use to determine if the maximum value specified is
larger than the destination variable/memory has allocated.
- sprintf(), snprintf(), vsprintf(), vsnprintf(), swprintf(), and vswprintf()
Allow you to compose multiple variables into a final text string.You should
determine that the sum of the variable sizes (as specified by the given
format) does not exceed the maximum size of the destination variable. For
snprintf() and vsnprintf(), the maximum value should not be larger than the
destination variable’s size.
- gets() and fgets() Read in a string of data from various file descriptors.
Both can possibly read in more data than the destination variable was allocated
to hold.The fgets() function requires a maximum limit to be speci-
fied; therefore, you must check that the fgets() limit is not larger than the
destination variable size.
- getc(), fgetc(), getchar(), and read() Used in a loop have a potential
chance of reading in too much data if the loop does not properly stop
reading in data after the maximum destination variable size is reached.You
will need to analyze the logic used in controlling the total loop count to
determine how many times the code loops using these functions.
Checking the Output Given to the User
Most applications will, at one point or another, display some sort of data to the user.
You would think that the printing of data is a fundamentally secure operation; but
alas, it is not. Particular vulnerabilities exist that have to do with how the data is
printed, and what data is printed.
Format String Vulnerabilities
Format string vulnerabilities are a class of vulnerability that arises from the *printf
family of functions (printf(), fprintf(), and so on).This class of functions allows you
to specify a "format" in which the provided variables are converted into string
format.
NOTE: Technically, the functions described in this section are a buffer overflow
attack, but we are classifying them under this category due to the popular
misuse of the printf() and vprintf() functions normally used for
output.
|
The vulnerability arises when an attacker is able to specify the value of the
format string. Sometimes, this is due to programmer laziness.The proper way of
printing a dynamic string value would be:
printf("%s",user_string_data);
However, a lazy programmer may take a shortcut approach.
printf(user_string_data);
Although this does indeed work, a fundamental problem is involved:The function
is going to look for formatting commands within the supplied string.The user may
supply data the function believes to be formatting/conversion commandsand via this
mechanism she could cause a buffer overflow due to how those formatting/conversion
commands are interpreted (actual exploitation to cause a buffer overflow is a little
involved and beyond the scope of this chapter; suffice it to say that it definitely can be
done and is currently being done on the Internet as we speak).
NOTE: You can find more information on format string vulnerabilities in an
analysis written by Tim Newsham, available online at http://comsec.theclerk.com/CISSP/FormatString.pdf.
|
Format string bugs are, again, seemingly limited to C/C++. While other languages
have *printf functionality, their handling of these issues may exclude them
from exploitation. For example, Perl is not vulnerable (which stems from how Perl
actually handles variable storage). So, to find potential vulnerable areas in your
C/C++ code, you need to look for the functions printf(), fprintf(), sprintf(),
snprintf(), vprintf(), vfprintf(), vsprintf(), vsnprintf(), wsprintf(), and wprintf().
Determine if any of the listed functions have a format string containing user-supplied
data. Ideally, the format string should be static (a predefined, hard-coded
string); however, as long as the format string is generated and controlled internal to
the program (with no user intervention), it should be safe.
Home-grown logging routines (syslog, debug, error, and so on) tend to be culprits
in this area.They sometimes hide the actual avenue of vulnerability, requiring you to
backtrack through function calls. Imagine the following logging routine (in C):
void log_error (char *error){
char message[1024];
snprintf(message,1024,"Error: %s",error);
fprintf(LOG_FILE,message);
}
Here we have fprintf() taking the message variable as the format string.This variable
is composed of the static string "Error:" and the error message passed to the
function. (Notice the proper use of snprintf to limit the amount of data put into the
message variable; even if it’s an internal function, it’s still good practice to safeguard
against potential problems.)
So, is this a problem? Well, that depends on every use of the log_error() function.
So now you should go back and look at every occurrence of log_error(), evaluating
the data being supplied as the parameter.
Cross-Site Scripting
Cross-site scripting (CSS) is a particular concern due to its potential to trick a user.
CSS is basically due to Web applications taking user data and printing it back out to
the user without filtering it. It’s possible for an attacker to send a URL with
embedded client-side scripting commands; if the user clicks on this Trojaned URL,
the data will be given to the Web application. If the Web application is vulnerable, it
will give the data back to the client, thus exposing the client to the malicious
scripting code.The problem is compounded due to the fact that the Web application
may be in the user’s trusted security zonethus the malicious scripting code is not
limited to the same security restrictions normally imposed during normal Web
surfing.
To avoid this, an application must explicitly filter or otherwise re-encode usersupplied
data before it inserts it into output destined for the user’s Web browser.
Therefore, what follows is a list of typical output functions; your job is to determine
if any of the functions print out tainted data that has not been passed through some
sort of HTML escaping function. An HTML escape routine will either remove any
found HTML elements or encode the various HTML metacharacters (particularly
replacing the "<" and ">" characters with "<" and ">" respectively) so the
result will not be interpreted as valid HTML. Looking for CSS vulnerabilities is
tough; the best place to start is with the common output functions used by your language:
- C/C++ Calls to printf(), fprintf(), output streams, and so on.
- ASP Calls to Response.Write and Response.BinaryWrite that contain
user variables, and direct variable output using <%=variable%> syntax.
- Perl Calls to print, printf, syswrite, and write that contain variables
holding user-supplied data.
- PHP Calls to print, printf, and echo that contain variables that may hold
user-supplied data.
- TCL Calls to puts that contain variables that may hold user-supplied data.
In all languages, you need to trace back to the origin of the user data and determine
if the data goes through any filtering of HTML and/or scripting characters. If it
doesn’t, an attacker could use your Web application for a CSS attack against another
user (taking advantage of your user/customer due to your application’s insecurity).
Information Disclosure
Information disclosure is not a technical problem per se. It’s quite possible that your
application may provide an attacker with an insightful piece of knowledge that could
aid him in taking advantage of the application.Therefore, it’s important to review
exactly what information your application makes available.
Some general things to look for in all languages include:
- Printing sensitive information (passwords, credit card numbers) in
full display Many applications do not transmit full credit card numbers;
rather, they show only the last four or five digits. Passwords should be obfuscated
so a bypasser cannot spot the actual password on a user’s terminal.
- Displaying application configuration information, server configuration
information, environment variables, and so on Doing so may
aid an attacker in subverting your security measures. Providing concise
details may help an attacker infer misconfigurations or lead him to specific
vulnerabilities.
- Revealing too much information in error messages This is a particularly
sinful area. Failed database connections typically spit out connection
details that include database host address, authentication details, and target
tables. Failed queries can expose table layout information, such as field
names and data types (or even expose the entire SQL query). Failed file
inclusion may disclose file paths (virtual or real), which allows an attacker
to determine the layout of the application.
- Avoiding the use of public debugging mechanisms in production
applications By "public" we mean any debugging information possibly
provided to the user.Writing debugging information to a log on the application
server is quite acceptable; however, none of that information should
be shown to (or be accessible by) the user.
Because the actual method of information disclosure can widely vary within any
language, there are no exact functions or code snippets to look for.
Checking for File System Access/Interaction
The Web is basically a graphically based file sharing protocol; the opening and
reading of user-specified files is the core of what makes the Web run.Therefore, it’s
not far off base for Web applications to interact with the file system as well.
Essentially, you should definitively know exactly where, when, and how a Web application
accesses the local file system on the server.The danger lies in using filenames
that contain tainted data.
Depending on the language, file system functions may operate on a filename or
a file descriptor. File descriptors are special variables that are the result of an initial
function that preps a filename for use by the program (typically by opening it and
returning a file descriptor, sometimes referred to as a handle). Luckily, you do not
have to concern yourself with every interaction with a file descriptor; instead, you
should primarily focus on functions that take filenames as parametersespecially
ones that contain tainted data.
NOTE: An entire myriad of file system–related problems exists that deal with
temporary files, symlink attacks, race conditions, file permissions, and
more. The breadth of these problems is quite largeparticularly when
considering the many available languages. However, all these problems
are limited (luckily) to the local system that houses the Web application.
Only attackers able to log in to that system would be able to potentially
exploit those vulnerabilities. We are not going to focus on this realm of
problems here, because best practice dictates using dedicated Web
application servers (which don’t allow normal user access).
|
Specific functions that take filenames as a parameter include:
- C/C++ Compiling a definitive list of all file system functions in C/C++
is definitely a challenge, due to the amount of external libraries and functions
available.Therefore, for starters, you should look at calls to the function:
open(), fopen(), creat(), mknod(), catopen(), dbm_open(), opendir(),
unlink(), link(), chmod(), stat(), lstat(), mkdir(), readlink(), rename(), rmdir(),
symlink(), chdir(), chroot(), utime(), truncate(), and glob().
- ASP Calls to Server.CreateObject() that create Scripting.FileSystemObject
objects. Access to the file system is controlled via the use of the
Scripting.FileSystemObject; so if the application doesn’t use this object, you
don’t have to worry about file system vulnerabilities.The MapPath function
is typically used in conjunction with file system access, and thus serves as a
good indicator that the ASP page does somehow interact with the file
system on some level.
- Uses of the ChooseContent method of an IISSample.ContentRotator object (look for Server.CreateObject() calls for
IISSample.ContentRotator).
- Perl Calls to the functions chmod, chown, link, lstat, mkdir, readlink,
rename, rmdir, stat, symlink, truncate, unlink, utime, chdir, chroot,
dbmopen, open, sysopen, opendir, and glob.
- Look for uses of the IO::* and File::* modules; each of these modules
provides (numerous) ways to interact with the file system and
should be closely observed (you can quickly find uses of module functions
by searching for the IO:: and File:: prefix).
NOTE: Technically, it’s possible to import module functions into your own
namespace in Perl and Python; this means that the module:: (as in Perl)
and module. (as in Python) prefixes may not necessarily be used.
|
- PHP Calls to the functions opendir(), chdir(), dir(), chgrp(), chmod(),
chown(), copy(), file(), fopen(), get_meta_tags(), link(), mkdir(), readfile(),
rename(), rmdir(), symlink(), unlink(), gzfile(), gzopen(), readgz- file(),
fdf_add_template(), fdf_open(), and fdf_save().
- One interesting thing to keep in mind is that PHP’s fopen has what is
referred to as a "fopen URL wrapper."This allows you to open a "file"
contained on another site by using the command such as
fopen("http://www.neohapsis. com/","r").This compounds the
problem because an attacker can trick your application into opening a
file contained on another server (and thus, probably controlled by him).
- Python Calls to the open function.
- If the os module is imported, you need to look for the functions
os.chdir, os.chmod, os.chown, os.link, os.listdir, os.mkdir, os.mkfifo,
os.remove, os.rename, os.rmdir, os.symlink, os.unlink, and os.utime.
NOTE: The os module functions may also be available if the posix module is
imported, possibly using a posix.* prefix instead of os.*. The posix
module actually implements many of the functions, but we recommend
that you use the os module’s interface and not call the posix functions
directly.
|
- Java Check to see if the application imports any of the following packages:
java.io.*, java.util.zip.*, or java.util.jar. If so, the application can possibly
use one of the file streams contained in the package for interacting
with a file. Luckily, however, all file usage depends on the File class contained
in java.io.Therefore, you really only need to look for the creation of
new File classes (File variable = new File ...)
- The File class itself has many methods that need to be checked: mkdir,
renameTo.
- TCL Check all uses of the file* commands (which will appear as two
words, file operation, where the operation will be a specific file operation,
such as rename).
- Uses of the glob and open functions.
- JSP Use of the <%@include file=’filename’%> statement. However, the
file inclusion specified happens at compile time, which means the filename
cannot be altered by user data. However, keeping tabs on what files are
being included in your application is wise.
- Use of the jsp:forward and jsp:include tags. Both load other files/pages
for continued processing and accept dynamic filenames.
- SSI Uses of the <!#include file=""> (or <!#include virtual=""
>) tags.
- ColdFusion Uses of the CFFile and CFInclude tags.
Checking External Program and Code Execution
Hopefully, all the logic and functionality will stay within your application and your
programming language’s core functions. However, with the greater push toward
modular code over the last number of years, oftentimes your program will make use
of other programs and functions not contained within it.This is not necessarily a bad
thing, because a programmer should definitely not reinvent the wheel (introducing
potential security problems in the process). However, how your program interacts
with external applications is an important question that must be answered, especially
if that interaction involves the user to some degree.
Calling External Programs
All calls to external programs should be evaluated to determine exactly what they
are calling. If tainted user data is included within the call, it may be possible for an
attacker to trick the command processor into executing additional commands (perhaps
by including shell metacharacters), or changing the intended command (by
adding additional command-line parameters).This is an age-old problem with Web
CGI scripts it seems; the first CGI scripts called external UNIX programs to do
their work, passing user-supplied data to them as parameters. It wasn’t long before
attackers realized they could manipulate the parameters to execute other UNIX programs
in the process.
Various things to look for include:
- C/C++ The exec* family of functions (exec(), execv(), execve(), and so
on) control.
- Perl Review all calls to system, exec, `` (backticks), qx//, and <> (the
globbing function).
- The open call supports what’s known as "magic" open, allowing
external programs to be executed if the filename parameter begins or
ends with a pipe ("|") character.You’ll need to check every open call
to see if a pipe is used, or more importantly, if it’s possible that tainted
data passed to the open call contain the pipe character.There are also
various open command functions contained in the Shell, IPC::Open2,
and IPC::Open3 modules.You will need to trace the use of these
module’s functions if your program imports them.
- TCL Calls to the exec command.
- PHP Calls to fopen() and popen().
- Python Check to see if the os (or posix) module is loaded. If so, you
should check each use of the os.exec* family of functions: os.exec,
os.execve, os.execle, os.execlp, os.execvp, and os.execvpe. Also check for
os.popen and os.system (or possibly posix.popen and posix.system).
- You should be wary of functionality available in the rexec module; if
this module is imported, you should carefully review all uses of rexec.*
commands.
- SSI Use of the <!#exec command=""> tag.
- Java Check to see if the java.lang package is imported. If so, check for uses
of Runtime.exec().
- PHP Calls to the functions exec(), passthru(), and system().
- ColdFusion Use of the CFExecute and CFServlet tag.
Dynamic Code Execution
Many languages (especially the scripting languages, such as Perl, Python,TCL, and so
on) contain mechanisms to interpret and run native scripting code. For example, a
Python script can take raw Python code and execute it via the compile command.
This allows the program to "build" a subprogram dynamically or allow the user to
input scripting code (fragments). However, the scary part is that the subprogram has
all the privileges and functionality of the main programif a user can insert his own
script code to be compiled and executed, he can effectively take control of the program
(limited only by the capabilities of the scripting language being used).This vulnerability
is typically limited to script-based languages.
The various commands that cause code compilation/execution include:
- TCL Uses of the eval and expr commands.
- Perl Uses of the eval function and do, and any regex operation with the e
modifier.
- Python Uses of the commands exec, compile, eval, execfile, and input.
- ASP Certain ASP interpreters may have Eval, Execute, and ExecuteGlobal
available.
External Objects/Libraries
Besides the dynamic generation and compilation of program code (discussed earlier),
a program can also choose to load or include a collection of code (commonly
referred to as a library) that is external to the program.These libraries typically
include common functions helpful in making the design of a program easier, specialty
functions meant to perform or aid in specific operations, or custom collections
of functions used to support your Web application. Regardless of what functions a
library may contain, you have to ensure the program loads the exact library
intended. An attacker may be able to coerce your program into loading an alternate
library, which could provide him an advantage. When you review your source code,
you must ensure that all external library loading routines do not use any sort of
tainted data.
NOTE:
External library vulnerabilities are technically the same as the file system
interaction vulnerabilities discussed previously. However, external libraries
have a few associated nuances (particularly in the methods/functions
used to include them) that warrant them being a separate problem area.
|
The following is a quick list of functions used by the various languages to
import external modules. In all cases, you should review the actual modules being
imported, checking to see if it’s possible for a user to modify the importation process
(via tainted data in the module name, for example).
- Perl import, require, use, and do
- Python import and __import__
- ASP Server.CreateObject(), and the <OBJECT runat="server"> tag when
found in global.asa
- JSP jsp:useBean
- Java URLClassLoader and JarURLConnection from the java.net package;
ClassLoader, Runtime.load, Runtime.loadLibrary, System.load, and
System.loadLibrary from the java.lang package
- TCL load, source, and package require
- ColdFusion CFObject
Checking Structured Query
Language (SQL)/Database Queries
This is a more recent emerging area of vulnerability specifically due to the growing
use of databases in conjunction with Web applications. Obviously, databases make for
great central repositories for storing, parsing, and retrieving a variety of information.
The largest area of vulnerability lies in the use of the database SQL, which is a standard,
human-oriented query language used to perform operations on a database.The
specific vulnerability has to do with SQL being human-oriented, or better put,
being natural-language oriented.This means that an actual SQL query is designed to
be readable and understandable by humans, and that computers must first parse and
figure out exactly what the query was intended to do. Due to the nature of this
approach, an attacker may be able to modify the intent of the human-readable SQL
language, which in turn results in the database believing the query has a completely
different meaning.
NOTE: The exact level of risk associated with SQL-related vulnerabilities is
directly dependant on the particular database software you use and the
features that software provides.
|
But this isn’t the only SQL/database vulnerability.The significant areas of vulnerability
fall into one of two types:
- Connection setup
- Tampering with queries
During the setup of connections with a database, you need to look at the application
and determine where the application initially connects to the database.
Typically, a connection is made before queries can be run.The connection usually
contains authentication information: username, password, database server, table name,
and so on.This authentication information should be considered sensitive, and therefore
the application should be examined on how it stores this information prior,
during, and after use (upon connecting to the database). Of course, none of the
authentication information used during connection setup should contain tainted
data; otherwise, the tainted data needs to be analyzed to determine if a user could
potentially supply or alter the credentials used to establish a connection to the
database server. As discussed in Chapter 4,"Vulnerable CGI Scripts," when we talked
about SQL Injection, tampering with queries is a common vulnerability.The
dynamic nature of Web applications dictates that they somehow dynamically process
a user’s request. Databases allow the program (on behalf of the user) to query for a
particular set of data within the supplied parameters, and/or to store the resulting
data into the database for later use.The biggest problem is that this involves actually
inserting the tainted data into the query itself in some form or another. An attacker
may be able to submit data that, when inserted into a SQL query, will trick the
SQL/database server into executing different queries than the one intended.This
could allow an attacker to tamper with the data contained in the database, view
more data than was intended to be viewed (particularly records of other users), and
bypass authentication mechanisms that use user credentials stored in a database.
Given the two problem areas, the following list of functions/commands will lead
you to potential problems:
- C/C++ Unfortunately, no "standard" library exists for accessing various
external databases.Therefore, you will have to do a little legwork on your
own and determine what function(s) are used to establish a connection to
the database and what function(s) are used to prepare/perform a query on
the database. After that’s determined, you just search for all uses of those
target functions.
- PHP Calls to the functions ifx_connect(), ifx_pconnect(), ifx_prepare(),
ifx_query(), msql_connect(), msql_pconnect(), msql_db_query(),
msql_query(), mysql_connect(), mysql_db_query(), mysql_pconnect(),
mysql_query(), odbc_connect(), odbc_exec(), odbc_pconnect(), odbc_prepare(),
ora_logon(), ora_open(), ora_parse(), ora_plogon(), OCILogon(),
OCIParse(), OCIPLogon(), pg_connect(), pg_exec(), pg_pconnect(),
sybase_connect(), sybase_pconnect(), and sybase_query().
- ASP Database connectivity is handled by the ADODB.* objects.This
means that if your script doesn’t create an ADODB.Connection or
ADODB.Recordset object via the Server.CreateObject function, you don’t
have to worry about your script containing ADO vulnerabilities. If your
script does create ADODB objects, you need to look at the Open methods
of the created objects.
- Java Java uses the JDBC (Java DataBase Connectivity) interface stored in
the java.sql module. If your application uses the java.sql module, you need
to look at the uses of the createStatement() and execute() methods.
- Perl Perl can use the generic database-independent DBI module, or the
database-specific DB::* modules.The functions exported by each module
widely vary, so you should determine which (if any) of the modules are
loaded and find the appropriate functions.
- Cold Fusion The CFInsert, CFQuery, and CFUpdate tags handle interactions
with the database.
Checking Networking
and Communication Streams
Checking all outgoing and incoming network connections and communication
streams used by a program is important. For example, your program may make an
FTP connection to a particular server to retrieve a file. Depending on where tainted
data is included, an attacker could modify which FTP server your program connects
to, what user credentials are presented, or which file is retrieved. It’s also very important
to know if the Web application sets up any listening server processes that answer
incoming network connections. Incoming network connections pose many problems,
because any vulnerability in the code controlling the listening service could
potentially allow a remote attacker to compromise the server.Worse, custom network
services, or services run in conjunction with unusual port assignments, may
subvert any intrusion detection or other attack-alert systems you may have set up to
monitor for attackers.
What follows is a list of various functions that allow your program to establish or
use network/communication streams:
- Perl and C/C++ Uses of the connect command indicate the application
is making outbound network connections."Connect" is a common
name that may be found in other languages as well.
- Uses of the accept command means the application is potentially listening
for inbound network connections. Accept is also a common
name that may be found in other languages.
- PHP Uses of the functions imap_open, imap_popen, ldap_connect,
ldap_add, mcal_open, fsockopen, pfsockopen, ftp_connect, and ftp_login,
mail.
- Python Uses of the socket.*, urllib.*, and ftplib.* modules.
- ASP Use of the Collaborative Data Objects (CDO) CDONTS.* objects;
in particular, watch for CDONTS.Attachment, CDONTS.NewMail
AttachFile, and AttachURL. An attacker might be able to trick your application
into attaching a file you don’t want to be sent out.This is similar to
the file system-based vulnerabilities described earlier.
- Java The inclusion of the java.net.* package(s), and especially for the use
of ServerSocket (which means your application is listening for inbound
requests). Also, keep a watch for the inclusion of java.rmi.*. RMI is Java’s
remote method invocation, which is functionally similar to CORBA’s.
- ColdFusion Look for the tags CFFTP, CFHTTP, CFLDAP, CFMail, and
CFPOP.
PUTTING IT ALL TOGETHER
So, now that you have this large list of target functions/commands, how do you
begin to look for them in a program? Well, the answer varies slightly, depending on
your resources. On the simple side, you can use any editor or program with a builtin
search/find function (even a word processor will do). Just search for each listed
function, taking note of where it is used by the application and in what context.
Programs that can search multiple files at one time (such as UNIX grep) are much
more efficienthowever, command-line utilities such as grep don’t let you interactively
scroll through the program.We enjoy the use of the GNU less program, which
allows you to view a file (or many files). It even has built-in search capability.
Windows users could use the DOS find command;Windows users may also
want to investigate the use of a shareware programming code editor by the name of
UltraEdit. UltraEdit (www.ultraedit.com) allows the visual editing of files and
searching within a file or across multiple files. If you are really hard-pressed for
searching multiple files on Windows, you can technically use the Windows Find
Files feature, which allows you to search a set of files for a specified string. As we
mentioned earlier in this chapter, SourceEdit from Brixoft (www.brixoft.net) can
also be used to review source code in numerous languages. On the extreme end,
uses of code and data modeling tools might point out subtle logic flaws and loops
that are otherwise hard to notice by normal review. Whichever tool you use, however,
ultimately the person who is performing the audit is best able to determine
major issues in the code.
SUMMARY
Making sure your Web applications are secure is a due-diligence issue many administrators
and programmers should undoubtedly performbut lacking the expertise
and time to do so is sometimes an overriding factor.Therefore, it’s important to promote
a simple method of secure code review anyone can tackle. Looking for specific
problem areas and then tracing the program execution in reverse provides an effi-
cient and manageable approach for wading through large amounts of code. By
focusing on high-risk areas (buffer overflows, user output, file system interaction,
external programs, and database connectivity), you can easily remove a vast number
of common mistakes plaguing many Web applications found on the Net today.
SOLUTIONS FAST TRACK
How to Efficiently Trace through a Program
- Tracing a program’s execution from start to finish is too time intensive.
- You can save time by instead going directly to problem areas.
- This approach allows you to skip benign application processing/ calculation
logic.
Auditing and Reviewing
Selected Programming Languages
- Uses of popular and mature programming language can help you audit the
code.
- Certain programming languages may have features that aid you in
efficiently reviewing the code.
Looking for Vulnerabilities
- Review how user data is collected.
- Check for buffer overflows.
- Analyze program output.
- Review file system interaction.
- Audit external component use.
- Examine database queries and connections.
- Track use of network communications.
Pulling It All Together
- Use tools such as UNIX grep, GNU less, the DOS find command,
UltraEdit, or SourceEdit to look for the functions previously listed.
FREQUENTLY ASKED QUESTIONS
The following Frequently Asked Questions, answered by the authors of this book,
are designed to both measure your understanding of the concepts presented in
this chapter and to assist you with real-life implementation of these concepts. To
have your questions about this chapter answered by the author, browse to
www.syngress.com/solutions and click on the "Ask the Author" form.
Q: This is tedious. Do any automated tools do this work?
A: Due to the custom and dynamic nature of source code, it’s very hard to design a
tool that is capable of understanding what the developer intended and how an
attacker might subvert that.Tools such as SourceEdit help highlight some problem
areasbut these tools are far from becoming an automated replacement.
Q: Will outside companies check our source code for us?
A: We suggest you check SecurityFocus.com. SecurityFocus.com maintains a multivendor
security service offerings directory, which includes a list of companies
that perform formal code audits.
Q: Where can I find information online about potential threats and how to defend
against them?
A: Lincoln Stein has written the Web Security FAQ, available online at
www.w3.org/Security/Faq/www-security-faq.html.There is also the Secure
Programming for Linux and UNIX HOWTO (which includes C/C++, Java,TCL,
Python, and Perl) available at www.dwheeler.com/secure-programs.
Q: Where’s the best place to find out more information regarding secure coding in
my particular language?
A: The vendor of the particular programming language is definitely the best place
to start. However, some languages (such as C/C++,TCL, and so on) don’t have
official "vendors"but many support sites exist. For example, perl.com features
a wealth of information for Perl programmers.
|