PHP strip_tags() Function

PHP

PHP strip_tags() - Strip HTML Tags

The strip_tags() function in PHP is an essential tool for developers who need to clean strings by removing unwanted HTML and PHP tags. Whether you are sanitizing user input, preparing text for safe display, or simply need to extract plain text from HTML content, strip_tags() helps by stripping any tags and leaving only the desired plain text.

In this tutorial, you will learn how to use the PHP strip_tags() function effectively. We'll cover prerequisites, basic usage, detailed examples, best practices, common mistakes, and even interview questions tailored to this topic.

Prerequisites

  • Basic knowledge of PHP programming
  • Understanding of strings and HTML tags
  • PHP installed locally or via a web server for testing

Setup Steps

  1. Install PHP (version 5.0 or later). You can use XAMPP, MAMP, or simply install PHP on your OS.
  2. Create a PHP file, for example strip-tags-demo.php.
  3. Open your favorite code editor (VSCode, PhpStorm, Sublime, etc.) and start writing your PHP script.
  4. Run the script via command line with php strip-tags-demo.php or through your web server.

Introduction to strip_tags()

The strip_tags() function removes HTML and PHP tags from a string. The syntax is:

string strip_tags ( string $str [, string $allowable_tags ] )
  • $str: The input string containing tags to remove.
  • $allowable_tags: Optional parameter specifying tags you want to allow in the result.

The function returns the string stripped of all tags, except those specified as allowable.

Basic Examples

Example 1: Strip all HTML tags

<?php
$input = "<p>Hello <strong>World</strong>!</p>";
$output = strip_tags($input);
echo $output; // Output: Hello World!
?>

Explanation: All <p> and <strong> tags are removed, returning plain text.

Example 2: Allow specific tags

<?php
$input = "<p>Hello <strong>World</strong>!</p>";
$output = strip_tags($input, "<strong>");
echo $output; // Output: Hello <strong>World</strong>!
?>

Explanation: The <strong> tag is preserved, while others like <p> are removed.

Example 3: Stripping tags from PHP code embedded in strings

<?php
$input = "This is a test <?php echo 'Hello!'; ?>";
$output = strip_tags($input);
echo $output; // Output: This is a test 
?>

Explanation: PHP tags are removed as well by the function.

Best Practices

  • Always sanitize user input before outputting to prevent XSS. Use strip_tags() but also consider additional validation.
  • Allow only necessary tags by specifying allowable tags if some formatting is required.
  • Combine with other sanitization functions like htmlspecialchars() or dedicated libraries for safer HTML processing.
  • Check encoding and character sets as strip_tags() does not handle encoding issues.
  • Use on strings, not on raw HTML files or large documents. It's meant for small to moderate strings.

Common Mistakes

  • Assuming strip_tags() fully protects against XSS (it does not, always combine with other protection).
  • Using strip_tags() expecting it to decode HTML entities (it does not). Use html_entity_decode() separately.
  • Passing incorrect allowable tags format (must be a string like <tag><tag2>, not an array).
  • Ignoring the fact that attributes inside tags are completely removed, not sanitized.
  • Using it on very large strings or entire HTML documents — it may be inefficient or remove unintended content.

Interview Questions

Junior Level Questions

  1. What does the PHP strip_tags() function do?
    It removes all HTML and PHP tags from a string, returning plain text.
  2. How do you allow specific tags when using strip_tags()?
    By passing the allowable tags as the second parameter in a string, like "<b><i>".
  3. Can strip_tags() remove PHP tags such as <?php ?>?
    Yes, it removes both HTML and PHP tags.
  4. What will strip_tags("<p>Hello</p>") output?
    Hello
  5. Is strip_tags() case sensitive when specifying allowable tags?
    No, allowable tags are case-insensitive, but it is recommended to use lowercase for consistency.

Mid Level Questions

  1. What is the correct format to specify multiple allowable tags in strip_tags()?
    A single string containing tags in brackets, e.g. "<b><i><u>".
  2. Does strip_tags() remove content inside script or style tags?
    No, it only removes the tags themselves; content inside remains as plain text.
  3. Can you use strip_tags() to safely prevent XSS attacks?
    No, it helps but does not fully prevent XSS; additional input validation and escaping are required.
  4. What happens if you pass an array instead of a string as the second parameter in strip_tags()?
    It results in a PHP warning and the parameter is ignored.
  5. How would you strip tags but preserve line breaks?
    Use strip_tags() and then replace <br> or <p> tags with line breaks before stripping or allow them with strip_tags().

Senior Level Questions

  1. Explain why strip_tags() might not be sufficient for sanitizing user input in web applications.
    Because it only removes tags without context-aware filtering, attributes, or JavaScript events, leaving possible XSS vectors intact.
  2. How can you extend the functionality of strip_tags() to allow safe tags with limited attributes?
    You need to implement or use a library that parses HTML and selectively whitelists tags and attributes, as strip_tags() does not support attribute preservation.
  3. What are performance considerations when using strip_tags() on large strings or HTML documents?
    strip_tags() is not optimized for large inputs, so it may slow down scripts or cause excessive memory usage; specialized parsers are preferred for large or complex HTML.
  4. How does strip_tags() handle uppercase or mixed-case HTML tags?
    It handles tags case-insensitively but matches tags exactly as written in allowable tags parameter, so consistent casing improves reliability.
  5. Can you explain how strip_tags() treats malformed HTML tags?
    It tries to strip tags but may fail or behave unpredictably on malformed tags since it uses a parser based on regular expressions, not a full HTML parser.

Frequently Asked Questions (FAQ)

Q: Does strip_tags() remove attributes from HTML tags?
A: Yes, it removes the entire tag, including attributes. It leaves the content inside the tag intact.
Q: Can strip_tags() remove JavaScript event handlers embedded in tags?
A: It removes the tags themselves, so event handlers in tags are removed, but if JavaScript is present in inline text, it might remain.
Q: How do I keep certain formatting like <b> or <i> while stripping others?
Pass those tags as a string in the second parameter, e.g. strip_tags($string, "<b><i>").
Q: Is strip_tags() suitable for sanitizing all user inputs?
No, it is primarily for removing tags. Use it alongside other sanitation functions like htmlspecialchars() and input validation.
Q: Can the strip_tags() function break HTML structure?
Yes, because it only removes tags without checking document structure, it can produce invalid or unexpected HTML output.

Conclusion

The PHP strip_tags() function is a simple yet powerful way to strip HTML and PHP tags from strings. It is especially useful for cleaning inputs, preparing text for safe display, and extracting plain text from formatted content. However, strip_tags() alone does not guarantee full security or prevention against XSS, so it must be used carefully in combination with other sanitization and validation techniques.

By understanding its parameters, limitations, and usage scenarios, you can leverage strip_tags() effectively as part of your PHP string manipulation toolkit.