Filtering Input in PHP
A quick primer.
This is meant to cover some variable cleansing in PHP, which is an advanced topic, but a must read for programmers.
Proper filtering methods will result in better security in your application, and will lead to better stability as well.
Variables you'll want to filter
Any variable not generated by the script in the current scope, or used in a script. This becomes even more important for those who have REGISTER_GLOBALS enabled in PHP (a major security problem). To see an issue with register globals, check out Web Application Security with PHP. Alternately, read more about them on PHP.NET.
Filtering Functions
$val = strip_tags($tobefiltered,"<allowedtag>");
A working example:
$val = "<b><i>HELLO</i></b>";
$val = strip_tags($val,"<b>");
echo $val;
This will output:
HELLO
This is because you allowed the BOLD tag (B) to go through. It is imperative to note that the HTML <b> and XHTML <strong> are different, and interchanging them wouldn't work. You'd have to include both exceptions. The same applies with HTML <i> and XHTML <em>
You SHOULD NOT use this with exceptions however, or at least I don't trust them. The best use of the function is as below:
$val = strip_tags($val);
Notice I didn't specify the second argument, causing strip_tags to eliminate all tags.
Regular Expressions
Regular expressions are a language of their own, and honestly, they vary by protocol (Perl vs. PHP, vs. others). Thankfully, PHP offers a preg_replace function that actually uses Perl's RegEx interpreter. I will be demonstrating ereg_replace however, which is PHP's typical RegEx handler.
RegEx is a very powerful tool to have, allowing you to validate e-mail addresses, dates, server logs, addresses, and best of all: all input.
Since going through all possible requirements for filtering and validation (like listed above), I find it easier to point you to a resource on everything. But for simple input, here's how it's used:
$var = ereg_replace("Expression","With","text");
Expression is the actual RegEx code.
With is what you'll be replacing the "bad part" with, typically you'll want to leave it blank, as in: "".
Text is the text to filter.
$var = "HELLO@#$ HOW ARE YOU?123!!345";
$var = ereg_replace("[^A-Za-z0-9]", "", $var);
echo $var;
This will output:
HELLOHOWAREYOU123345
Reason being: It also filters out spaces -- anything that isn't A-Za-z0-9 (capital and lowercase A through Z, and 0 through 9).
This is especially useful for the $_GET array:
For example: http://www.youn00b.com/?mode=view
You can filter out the MODE string this way to avoid arbitrary text.
This is especially good for the $_COOKIE array too, for example, if you're using MD5() for a cookie value:
$var = $_COOKIE['mycookie'];
if (strlen(ereg_replace("[^A-Za-z0-9]", "", $var))==32)
{
//cookie is good
}
else
{
//cookie is bad
}
This works because MD5() is always alphanumeric and 32 characters. SHA1() is always alphanumeric and 40 characters -- so a slight change will need to be made.
What if you only want numbers to go through?
$var = "HELLO@#$ HOW ARE YOU?123!!345";
$var = ereg_replace("[^0-9]", "", $var);
echo $var;
Output: 123345
This is good for page IDs, like:
http://www.youn00b.com/articles-1234
Where 1234 is the article ID.
Important Note on Arrays and Filtering
You cannot filter an entire array with the ereg_replace technique. This will strip other important info that maintains the array's structure. In this case, you will have to step through and clean out each individual element. This would such for a multidimensional array, but it's alright for short ones (like a $_POST array for a series of checkboxes).
This is done like this:
while (list ($key,$val) = each ($_POST['postdata']))
{
$cleanedoutput = ereg_replace("[^0-9]","",$val);
//Use cleaned output here, or put into new array
}
This example would have no output.
Replacing certain words / characters
Many sites use this for a word filtering technique (for certain sites, and words), and also to prevent SQL injection (by replacing TRUNCATE, DELETE, UPDATE, SELECT, etc in vulnerable code).
Use:
(For case insensitive:)
str_ireplace("replacethis","withthis","inthis")
(For case sensitive:)
str_replace("replacethis","withthis","inthis")
Example:
$var = "I HATE BAD CODE";
$var = str_ireplace("hate bad","love good",$var);
echo $var;
Output: I love good CODE
You can see how this can be effective for replacing, or even eliminating text:
$var = "I HATE BAD CODE";
$var = str_ireplace("bad ","",$var);
echo $var;
Output: I HATE CODE
A caution about using this for word filtering:
You may choose to replace certain "bad" words with nothing to keep your site clean, for example "Ass". People are creative, and will find ways around this -- but that isn't always the intent. For example, replacing "ass" with "" could affect this:
Passive
Assassinate
Assume
To:
Pive
inate
ume
While replacing " ass " with " " may be better, people will simply put "-ass-" or something to override this...so careful implementation is a must if you're going this route.
More questions?
Post them below and I'll help you out.
A quick primer.
This is meant to cover some variable cleansing in PHP, which is an advanced topic, but a must read for programmers.
Proper filtering methods will result in better security in your application, and will lead to better stability as well.
Variables you'll want to filter
Any variable not generated by the script in the current scope, or used in a script. This becomes even more important for those who have REGISTER_GLOBALS enabled in PHP (a major security problem). To see an issue with register globals, check out Web Application Security with PHP. Alternately, read more about them on PHP.NET.
Filtering Functions
$val = strip_tags($tobefiltered,"<allowedtag>");
A working example:
$val = "<b><i>HELLO</i></b>";
$val = strip_tags($val,"<b>");
echo $val;
This will output:
HELLO
This is because you allowed the BOLD tag (B) to go through. It is imperative to note that the HTML <b> and XHTML <strong> are different, and interchanging them wouldn't work. You'd have to include both exceptions. The same applies with HTML <i> and XHTML <em>
You SHOULD NOT use this with exceptions however, or at least I don't trust them. The best use of the function is as below:
$val = strip_tags($val);
Notice I didn't specify the second argument, causing strip_tags to eliminate all tags.
Regular Expressions
Regular expressions are a language of their own, and honestly, they vary by protocol (Perl vs. PHP, vs. others). Thankfully, PHP offers a preg_replace function that actually uses Perl's RegEx interpreter. I will be demonstrating ereg_replace however, which is PHP's typical RegEx handler.
RegEx is a very powerful tool to have, allowing you to validate e-mail addresses, dates, server logs, addresses, and best of all: all input.
Since going through all possible requirements for filtering and validation (like listed above), I find it easier to point you to a resource on everything. But for simple input, here's how it's used:
$var = ereg_replace("Expression","With","text");
Expression is the actual RegEx code.
With is what you'll be replacing the "bad part" with, typically you'll want to leave it blank, as in: "".
Text is the text to filter.
$var = "HELLO@#$ HOW ARE YOU?123!!345";
$var = ereg_replace("[^A-Za-z0-9]", "", $var);
echo $var;
This will output:
HELLOHOWAREYOU123345
Reason being: It also filters out spaces -- anything that isn't A-Za-z0-9 (capital and lowercase A through Z, and 0 through 9).
This is especially useful for the $_GET array:
For example: http://www.youn00b.com/?mode=view
You can filter out the MODE string this way to avoid arbitrary text.
This is especially good for the $_COOKIE array too, for example, if you're using MD5() for a cookie value:
$var = $_COOKIE['mycookie'];
if (strlen(ereg_replace("[^A-Za-z0-9]", "", $var))==32)
{
//cookie is good
}
else
{
//cookie is bad
}
This works because MD5() is always alphanumeric and 32 characters. SHA1() is always alphanumeric and 40 characters -- so a slight change will need to be made.
What if you only want numbers to go through?
$var = "HELLO@#$ HOW ARE YOU?123!!345";
$var = ereg_replace("[^0-9]", "", $var);
echo $var;
Output: 123345
This is good for page IDs, like:
http://www.youn00b.com/articles-1234
Where 1234 is the article ID.
Important Note on Arrays and Filtering
You cannot filter an entire array with the ereg_replace technique. This will strip other important info that maintains the array's structure. In this case, you will have to step through and clean out each individual element. This would such for a multidimensional array, but it's alright for short ones (like a $_POST array for a series of checkboxes).
This is done like this:
while (list ($key,$val) = each ($_POST['postdata']))
{
$cleanedoutput = ereg_replace("[^0-9]","",$val);
//Use cleaned output here, or put into new array
}
This example would have no output.
Replacing certain words / characters
Many sites use this for a word filtering technique (for certain sites, and words), and also to prevent SQL injection (by replacing TRUNCATE, DELETE, UPDATE, SELECT, etc in vulnerable code).
Use:
(For case insensitive:)
str_ireplace("replacethis","withthis","inthis")
(For case sensitive:)
str_replace("replacethis","withthis","inthis")
Example:
$var = "I HATE BAD CODE";
$var = str_ireplace("hate bad","love good",$var);
echo $var;
Output: I love good CODE
You can see how this can be effective for replacing, or even eliminating text:
$var = "I HATE BAD CODE";
$var = str_ireplace("bad ","",$var);
echo $var;
Output: I HATE CODE
A caution about using this for word filtering:
You may choose to replace certain "bad" words with nothing to keep your site clean, for example "Ass". People are creative, and will find ways around this -- but that isn't always the intent. For example, replacing "ass" with "" could affect this:
Passive
Assassinate
Assume
To:
Pive
inate
ume
While replacing " ass " with " " may be better, people will simply put "-ass-" or something to override this...so careful implementation is a must if you're going this route.
More questions?
Post them below and I'll help you out.