PHP str_word_count() Function

PHP

PHP str_word_count() - Count Words

Learn PHP str_word_count() function. Count the number of words in a string efficiently using PHP’s built-in str_word_count() function. This tutorial explains how to analyze text content and length by counting words within strings in PHP. Perfect for string handling tasks where word analysis is required.

Introduction

The str_word_count() function in PHP is a handy string function used to count the number of words present in a given string. It provides versatile ways to retrieve word counts or lists of words, helping developers analyze and manipulate textual data. Whether you're processing user input, analyzing articles, or preparing data for SEO purposes, str_word_count() is essential for quick word count operations.

Prerequisites

  • Basic knowledge of PHP programming language.
  • PHP environment setup (PHP 4+ supported, recommended PHP 7+ for modern projects).
  • Understanding of string operations in PHP.
  • Access to a code editor and PHP runtime (e.g., local server or online PHP sandbox).

Setup Steps

  1. Ensure PHP is installed and properly configured on your machine.
  2. Create a new PHP file, for example, word_count_example.php.
  3. Use any text editor or IDE like VS Code, PHPStorm, Sublime Text, or Notepad++.
  4. Write your PHP code that utilizes the str_word_count() function (examples covered below).
  5. Run the PHP script via your server or CLI to see the output.

Understanding str_word_count() Syntax

str_word_count(string $string, int $format = 0, string $charlist = ""): mixed
  • $string: The input string to count words from.
  • $format: Optional. Determines the return type:
    • 0 (default) - Returns an integer representing the number of words found.
    • 1 - Returns an array containing all the words found inside the string.
    • 2 - Returns an associative array, where the key is the position of the word in the string and the value is the word itself.
  • $charlist: Optional. List of additional characters which will be considered as part of a word.

Practical Examples

Example 1: Count number of words in a string

<?php
$text = "PHP str_word_count() counts the number of words.";
$wordCount = str_word_count($text);
echo "Number of words: " . $wordCount;
?>

Output:

Number of words: 8

Example 2: Retrieve words as an array

<?php
$text = "Count each word separately in this sentence.";
$wordsArray = str_word_count($text, 1);
print_r($wordsArray);
?>

Output:

Array
(
    [0] => Count
    [1] => each
    [2] => word
    [3] => separately
    [4] => in
    [5] => this
    [6] => sentence
)

Example 3: Get words with their starting position as keys

<?php
$text = "Track word positions efficiently.";
$wordsAssoc = str_word_count($text, 2);
print_r($wordsAssoc);
?>

Output:

Array
(
    [0] => Track
    [6] => word
    [11] => positions
    [21] => efficiently
)

Example 4: Count words including apostrophes

<?php
$text = "It's a developer's task to count words.";
$wordsIncludingApostrophe = str_word_count($text, 1, "'");
print_r($wordsIncludingApostrophe);
?>

Output:

Array
(
    [0] => It's
    [1] => a
    [2] => developer's
    [3] => task
    [4] => to
    [5] => count
    [6] => words
)

Best Practices

  • Use the appropriate $format option based on whether you need a count or the actual words.
  • When dealing with contractions or special characters (e.g., apostrophes), use the $charlist parameter to include them in word parsing.
  • Trim and sanitize input strings before word counting to avoid counting unwanted characters.
  • Remember that str_word_count() treats only letters and apostrophes (if included) as part of words β€” numbers and symbols may be excluded unless custom handling is implemented.
  • Use this function when you need quick and simple word counts; for more complex natural language parsing, consider more advanced libraries.

Common Mistakes

  • Assuming str_word_count() counts special characters or numbers as words by default β€” it doesn’t unless added via $charlist.
  • Passing a non-string variable without type checking, which can cause unexpected results or warnings.
  • Not handling multibyte characters properly in UTF-8 strings β€” str_word_count() is not multibyte-safe.
  • Relying on str_word_count() for complex linguistics tasks like stemming or lemmatization (better suited for external NLP tools).
  • Ignoring performance in large datasets; for extremely large strings, consider chunk processing for better memory usage.

Interview Questions

Junior Level

  • Q1: What does the PHP function str_word_count() do?
    A: It counts the number of words in a given string.
  • Q2: Which parameter do you use to get an array of words instead of just their count?
    A: The second parameter, by setting it to 1.
  • Q3: What will str_word_count("Hello World!") return by default?
    A: It will return 2, the number of words.
  • Q4: Can str_word_count() include apostrophes inside words?
    A: Yes, if you add an apostrophe to the third $charlist argument.
  • Q5: What type of value does str_word_count() return if the second parameter is set to 2?
    A: It returns an associative array with word positions as keys.

Mid Level

  • Q1: How would you count words in a string that contains hyphenated words as one word?
    A: Use the $charlist parameter to include hyphens (e.g., '-') so they’re treated as part of words.
  • Q2: Does str_word_count() support multibyte encoding strings like UTF-8?
    A: No, it is not multibyte-safe and may miscount words with multibyte characters.
  • Q3: Write a code snippet to extract all words and their starting positions from a sentence.
    A:
    $text = "Example sentence.";
    $words = str_word_count($text, 2);
    print_r($words);
  • Q4: What will happen if you pass an integer instead of a string to str_word_count()?
    A: PHP will convert the integer to a string, but it may not behave as expected and could cause warnings.
  • Q5: How can you count words considering underscores as part of words?
    A: Add underscore _ in the $charlist parameter.

Senior Level

  • Q1: How would you improve word counting for UTF-8 strings in PHP since str_word_count() is not multibyte-safe?
    A: Use `preg_match_all()` with appropriate Unicode regex or mbstring functions instead of str_word_count().
  • Q2: How can you handle counting words in complex sentences containing emojis or special Unicode symbols?
    A: Employ regex that matches Unicode word characters or use libraries designed for Unicode-aware word tokenization.
  • Q3: Explain how you would create a custom word counting function to include numbers and email addresses.
    A: Use regular expressions to detect words, numbers, and emails separately and combine counts accordingly.
  • Q4: Discuss performance considerations when using str_word_count() on very large texts.
    A: For very large texts, process chunks incrementally instead of one large call to reduce memory footprint.
  • Q5: Is it possible to extend str_word_count() to recognize other language-specific word boundaries?
    A: No, str_word_count() relies on English word boundaries. For other languages, use locale-aware or NLP libraries.

FAQ

1. Can str_word_count() count words in non-English languages?
It works best with English or similar languages. For languages with different word boundaries (e.g., Chinese, Japanese), it may not work correctly.
2. How to count words including numbers?
By default, numbers are not counted as words. You would need to implement a custom regex or pre-process the string accordingly.
3. Does str_word_count() consider punctuation as word boundaries?
Yes, punctuation marks separate words unless they are included in the $charlist parameter (e.g., apostrophes).
4. What is the difference between format options 1 and 2?
Format 1 returns a simple array of words, while format 2 returns an associative array with word positions as keys.
5. Is there any alternative to str_word_count() in PHP?
You can use preg_match_all() with word regex or PHP libraries for more advanced text tokenization.

Conclusion

The PHP str_word_count() function is a straightforward and effective tool to count and extract words from strings. Perfect for quick word counts and basic text analysis, its flexible parameters allow you to retrieve counts, word arrays, or position-mapped words. While it has limitations with multibyte strings and complex languages, it remains a valuable function for many string processing tasks. By following best practices and understanding its behavior, you can effectively integrate word count functionality into your PHP projects.