OWASP Guide to Building Secure Web Applications and Web Services, Chapter 12: Data Validation

This section of the OWASP Guide to Building Secure Web Applications and Web Services will help you ensure applications are secure against all forms of input data. Techniques explained include data integrity checks, validation and business rule validation.

This article is provided by special arrangement with the Open Web Application Security Project (OWASP). This article is covered by the Creative Commons Share-Alike Attribution 2.5 license. You can find the latest version of this article and more free and open application security tools and documentation at http://www.owasp.org.

Data Validation

To ensure that the application is robust against all forms of input data, whether
obtained from the user, infrastructure, external entities or database systems

Platforms Affected

Relevant COBIT Topics
DS11 – Manage Data. All sections should be reviewed

The most common web application security weakness is the failure to properly
validate input from the client or environment. This weakness leads to almost all of
the major vulnerabilities in applications, such as interpreter injection, locale/Unicode
attacks, file system attacks and buffer overflows.

Data from the client should never be trusted for the client has every possibility
to tamper with the data.

These definitions are used within this document:

  • Integrity checks
    Ensure that the data has not been tampered with and is the same as before
  • Validation
    Ensure that the data is strongly typed, correct syntax, within length boundaries, contains only permitted characters, or if numeric is correctly signed and within range boundaries
  • Business rules
    Ensure that data is not only validated, but business rule correct. For example, interest rates
    fall within permitted boundaries.

Some documentation and references interchangeably use the various meanings,
which is very confusing to all concerned. This confusion directly causes continuing
financial loss to the organization.

Where to include integrity checks
Integrity checks must be included wherever data passes from a trusted to a less
trusted boundary, such as from the application to the user's browser in a hidden field,
or to a third party payment gateway, such as a transaction ID used internally upon

The type of integrity control (checksum, HMAC, encryption, digital signature)
should be directly related to the risk of the data transiting the trust boundary.

Where to include validation
Validation must be performed on every tier. However, validation should be
performed as per the function of the server executing the code. For example, the
web / presentation tier should validate for web related issues, persistence layers
should validate for persistence issues such as SQL / HQL injection, directory lookups
should check for LDAP injection, and so on.

Where to include business rule validation
Business rules are known during design, and they influence implementation.
However, there are bad, good and "best" approaches. Often the best approach is the
simplest in terms of code.

Example - Scenario

  • You are to populate a list with accounts provided by the back-end system.
  • The user will choose an account, choose a biller, and press next.

Wrong Way
The account select option is read directly and provided in a message back to the
backend system without validating the account number is one of the accounts
provided by the backend system.

Why this is bad: 
An attacker can change the HTML in any way they choose:

  • The lack of validation requires a round-trip to the backend to provide an error
    message that the front end code could easily have eliminated
  • The back end may not be able to cope with the data payload the front-end code could
    have easily eliminated. For example, buffer overflows, XML injection, or similar.

Acceptable Method
The account select option parameter is read by the code, and compared to the
previously rendered list.

if ( account.inList(session.getParameter('payeelstid') ) {

This prevents parameter tampering, but still makes the browser do a lot of

Best Method
The original code emitted indexes

rather than account

int payeeLstId = session.getParameter('payeelstid');
accountFrom = account.getAcctNumberByIndex(payeeLstId);

Not only is this easier to render in HTML, it makes validation and business rule
validation trivial. The field cannot be tampered with.

To provide defense in depth and to prevent attack payloads from trust boundaries,
such as backend hosts, which are probably incapable of handling arbitrary input data,
business rule validation is to be performed (preferably in workflow or command
patterns), even if it is known that the back end code performs business rule

This is not to say that the entire set of business rules need be applied - it
means that the fundamentals are performed to prevent unnecessary round trips to the
backend and to prevent the backend from receiving most tampered data.

Data Validation Strategies
There are four strategies for validating data, and they should be used in this

Accept known good
If you expect a postcode, validate for a postcode (type, length and syntax):

public String validateAUpostCode(String postcode) {
 return (Pattern.matches("^(((2|8|9)d{2})|((02|08|09)d{2})|([1-9]d{3}))$", postcode)) ? postcode : '';
  • Reject known bad. If you don't expect to see characters such as %3f or
    JavaScript or similar, reject strings containing them:
public String removeJavascript(String input) {
 Pattern p = Pattern.compile("javascript", CASE_INSENSITIVE);
 return (!p.matches()) ? input : '';

It can take upwards of 90 regular expressions (see the CSS Cheat Sheet in the
Guide 2.0) to eliminate known malicious software, and each regex needs to be run
over every field. Obviously, this is slow and not secure.

Eliminate or translate characters (such as to HTML entities or to remove quotes) in
an effort to make the input "safe":

public String quoteApostrophe(String input) {
 return str.replaceAll("[']", "’");

This does not work well in practice, as there are many, many exceptions to the

No validation



public setAcctId(String acctId) {
 cAcctId = acctId;

This is inherently unsafe and strongly discouraged. The business must sign off
each and every example of no validation as the lack of validation usually leads to
direct obviation of application, host and network security controls.

Just rejecting "current known bad" (which is at the time of writing hundreds of
strings and literally millions of combinations) is insufficient if the input is a string.
This strategy is directly akin to anti-virus pattern updates. Unless the business will
allow updating "bad" regexes on a daily basis and support someone to research new
attacks regularly, this approach will be obviated before long.

As most fields have a particular grammar, it is simpler, faster, and more secure
to simply validate a single correct positive test than to try to include complex and
slow sanitization routines for all current and future attacks.

Data should be:

  • Strongly typed at all times
  • Length checked and fields length minimized
  • Range checked if a numeric
  • Unsigned unless required to be signed
  • Syntax or grammar should be checked prior to first use or inspection

Coding guidelines should use some form of visible tainting on input from the
client or untrusted sources, such as third party connectors to make it obvious that the
input is unsafe:

taintPostcode = getParameter('postcode');
validation = new validation();
postcode = validation.isPostcode(taintPostcode);

Prevent parameter tampering
There are many input sources:

  • HTTP headers, such as REMOTE_ADDR, PROXY_VIA or similar
  • Environment variables, such as getenv() or via server properties
  • All GET, POST and Cookie data

This includes supposedly tamper resistant fields such as radio buttons, drop
downs, etc - any client side HTML can be re-written to suit the attacker

Configuration data (mistakes happen :))

External systems (via any form of input mechanism, such as XML input, RMI,
web services, etc)

All of these data sources supply untrusted input. Data received from untrusted
data sources must be properly checked before first use.

Hidden fields
Hidden fields are a simple way to avoid storing state on the server. Their use is
particularly prevalent in "wizard-style" multi-page forms. However, their use exposes
the inner workings of your application, and exposes data to trivial tampering, replay,
and validation attacks. In general, only use hidden fields for page sequence.

If you have to use hidden fields, there are some rules:

  • Secrets, such as passwords, should never be sent in the clear.
  • Hidden fields need to have integrity checks and preferably encrypted using non-
    constant initialization vectors (i.e. different users at different times have different yet
    cryptographically strong random IVs).
  • Encrypted hidden fields must be robust against replay attacks, which means some
    form of temporal keying.
  • Data sent to the user must be validated on the server once the last page has been
    received, even if it has been previously validated on the server - this helps reduce the risk
    from replay attacks.

The preferred integrity control should be at least a HMAC using SHA-256 or
preferably digitally signed or encrypted using PGP. IBMJCE supports SHA-256, but PGP
JCE support requires the inclusion of the Legion of the Bouncy Castle
(http://www.bouncycastle.org/) JCE classes.

It is simpler to store this data temporarily in the session object. Using the
session object is the safest option as data is never visible to the user, requires (far)
less code, nearly no CPU, disk or I/O utilization, less memory (particularly on large
multi-page forms), and less network consumption.

In the case of the session object being backed by a database, large session
objects may become too large for the inbuilt handler. In this case, the recommended
strategy is to store the validated data in the database, but mark the transaction as
"incomplete". Each page will update the incomplete transaction until it is ready for
submission. This minimizes the database load, session size, and activity between the
users whilst remaining tamperproof.

Code containing hidden fields should be rejected during code reviews.

ASP.NET Viewstate
ASP.NET sends form data back to the client in a hidden "Viewstate" field. Despite
looking forbidding, this "encryption" is simply plain-text equivalent and has no data
integrity without further action on your behalf in ASP.NET 1.1. In ASP.NET 2.0,
tamper proofing is on by default.

Any application framework with a similar mechanism might be at fault – you
should investigate your application framework's support for sending data back to the
user. Preferably it should not round trip.

How to determine if you are vulnerable
Investigate the machine.config:

  • If the enableViewStateMac is not set to "true", you are at risk if your viewstate
    contains authorization state.
  • If the viewStateEncryptionMode is not set to "always", you are at risk if your
    viewstate contains secrets such as credentials.
  • If you share a host with many other customers, you all share the same machine key
    by default in ASP.NET 1.1. In ASP.NET 2.0, it is possible to configure unique viewstate
    keys per application.

How to protect yourself

  • If your application relies on data returning from the viewstate without being
    tampered with, you should turn on viewstate integrity checks at the least, and strongly
  • Encrypt viewstate if any of the data is application sensitive.
  • Upgrade to ASP.NET 2.0 as soon as practical if you are on a shared hosting
  • Move truly sensitive viewstate data to the session variable instead.

Selects, radio buttons, and checkboxes
It is commonly held belief that the value settings for these items cannot be easily
tampered. This is wrong. In the following example, actual account numbers are used,
which can lead to compromise:

<html:radio value="<%=acct.getCardNumber(1).toString( )% >" property="acctNo">
<bean:message key="msg.card.name" arg0="<%=acct.getCardName(1).toString( )% >" />
<html:radio value="<%=acct.getCardNumber(1).toString( )% >" property="acctNo">
<bean:message key="msg.card.name" arg0="<%=acct.getCardName(2).toString( )% >" />

This produces (for example):

<input type="radio" name="acctNo" value="455712341234">Gold Card
<input type="radio" name="acctNo" value="455712341235">Platinum Card

If the value is retrieved and then used directly in a SQL query, an interesting
form of SQL injection may occur: authorization tampering leading to information
disclosure. As the connection pool connects to the database using a single user, it may
be possible to see other user's accounts if the SQL looks something like this:

String acctNo = getParameter('acctNo');
String sql = "SELECT acctBal FROM accounts WHERE acctNo = '?'";
PreparedStatement st = conn.prepareStatement(sql);
st.setString(1, acctNo);
ResultSet rs = st.executeQuery();

This should be re-written to retrieve the account number via index, and
include the client's unique ID to ensure that other valid account numbers are exposed:

String acctNo = acct.getCardNumber(getParameter('acctIndex'));

String sql = "SELECT acctBal FROM accounts WHERE acct_id = '?' AND acctNo = '?'";
PreparedStatement st = conn.prepareStatement(sql);
st.setString(1, acct.getID());
st.setString(2, acctNo);
ResultSet rs = st.executeQuery();

This approach requires rendering input values from 1 to ... x, and assuming
accounts are stored in a Collection which can be iterated using logic:iterate:

<logic:iterate id="loopVar" name="MyForm" property="values">
 <html:radio property="acctIndex" idName="loopVar" value="value"/> 
 <bean:write name="loopVar" property="name"/> <br />

The code will emit HTML with the values "1" .. "x" as per the collection's

<input type="radio" name="acctIndex" value="1" />Gold Credit Card
<input type="radio" name="acctIndex" value="2" />Platinum Credit Card

This approach should be used for any input type that allows a value to be set:
radio buttons, checkboxes, and particularly select / option lists.

Per-User Data
In fully normalized databases, the aim is to minimize the amount of repeated
data. However, some data is inferred. For example, users can see messages that are
stored in a messages table. Some messages are private to the user. However, in a
fully normalized database, the list of message IDs are kept within another table:

If a user marks a message for deletion, the usual way is to recover the message
ID from the user, and delete that:

DELETE FROM message WHERE msgid='frmMsgId'

However, how do you know if the user is eligible to delete that message ID?
Such tables need to be denormalized slightly to include a user ID or make it easy to
perform a single query to delete the message safely. For example, by adding back an
(optional) uid column, the delete is now made reasonably safe:

DELETE FROM message WHERE uid='session.myUserID' and msgid='frmMsgId';

Where the data is potentially both a private resource and a public resource (for
example, in the secure message service, broadcast messages are just a special type of
private message), additional precautions need to be taken to prevent users from
deleting public resources without authorization. This can be done using role based
checks, as well as using SQL statements to discriminate by message type:

DELETE FROM message 
uid='session.myUserID' AND
msgid='frmMsgId' AND
broadcastFlag = false;

URL encoding
Data sent via the URL, which is strongly discouraged, should be URL encoded and
decoded. This reduces the likelihood of cross-site scripting attacks from working.

In general, do not send data via GET request unless for navigational purposes.

HTML encoding
Data sent to the user needs to be safe for the user to view. This can be done using
<bean:write ... > and friends. Do not use <%=var%> unless it is used to supply
an argument for <bean:write... > or similar.

HTML encoding translates a range of characters into their HTML entities. For
example, > becomes


This will still display as > on the user's browser, but it is a
safe alternative.

Encoded strings
Some strings may be received in encoded form. It is essential to send the correct
locale to the user so that the web server and application server can provide a single
level of canoncalization prior to the first use.

Do not use getReader() or getInputStream() as these input methods do not
decode encoded strings. If you need to use these constructs, you must decanoncalize
data by hand.

Delimiter and special characters
There are many characters that mean something special to various programs. If
you followed the advice only to accept characters that are considered good, it is very
likely that only a few delimiters will catch you out.

Here are the usual suspects:

  • NULL (zero) %00
  • LF - ANSI chr(10) "r"
  • CR - ANSI chr(13) "n"
  • CRLF - "nr"
  • CR - EBCDIC 0x0f
  • Quotes " '
  • Commas, slashes spaces and tabs and other white space - used in CSV, tab delimited
    output, and other specialist formats
  • <> - XML and HTML tag markers, redirection characters
  • ; & - Unix and NT file system continuance
  • @ - used for e-mail addresses
  • 0xff
  • ... more

Whenever you code to a particular technology, you should determine which
characters are "special" and prevent them appearing in input, or properly escaping

Further reading

Read more on Privacy and data protection