July 22nd, 2013 by Lincoln Baxter III

Guide to Regular Expressions in Java (Part 1)

Often unknown, or heralded as confusing, regular expressions (regex) have defined the standard for powerful text manipulation and search. Without them, many of the applications we know today would not function. This two-part series explores the basics of regular expressions in Java, and provides tutorial examples in the hopes of spreading love for our pattern-matching friends. (Read part two.)

Part 1: What are Regular Expressions?

Regular expressions are a language of string patterns built in to most modern programming languages, including Java 1.4 onward; they can be used for: searching, extracting, and modifying text. This chapter will cover basic syntax and use.
This article is part one in the series: “Regular Expressions.” Read part two for more information on lookaheads, lookbehinds, and configuring the matching engine. To get a more visual look into how regular expressions work, try our visual java regex tester. You can also watch a video to see how the visual regex tester works.

1. Syntax

Regular expressions, by definition, are string patterns that describe text. These descriptions can then be used in nearly infinite ways. The basic language constructs include character classes, quantifiers, and meta-characters.

1.1. Character Classes

Character classes are used to define the content of the pattern. E.g. what should the pattern look for?
.  	Dot, any character (may or may not match line terminators, read on)
\d  	A digit: [0-9]
\D  	A non-digit: [^0-9]
\s  	A whitespace character: [ \t\n\x0B\f\r]
\S  	A non-whitespace character: [^\s]
\w  	A word character: [a-zA-Z_0-9]
\W  	A non-word character: [^\w]
However; notice that in Java, you will need to “double escape” these backslashes.
String pattern = "\\d \\D \\W \\w \\S \\s";

1.2. Quantifiers

Quantifiers can be used to specify the number or length that part of a pattern should match or repeat. A quantifier will bind to the expression group to its immediate left.
*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

1.3. Meta-characters

Meta-characters are used to group, divide, and perform special operations in patterns.
\   	Escape the next meta-character (it becomes a normal/literal character)
^   	Match the beginning of the line
.   	Match any character (except newline)
$   	Match the end of the line (or before newline at the end)
|   	Alternation (or’ statement)
()  	Grouping
[]  	Custom character class

Visual Regex Tester

To get a more visual look into how regular expressions work, try our visual java regex tester.

2. Examples

2.1. Basic Expressions

Every string is a regular expression. For example, the string, “I lost my wallet”, is a regular expression that will match the text, “I lost my wallet”, and will ignore everything else. What if we want to be able to find more things that we lost? We can replace wallet with a character class expression that will match any word.
"I lost my \\w+"
As you can see, this pattern uses both a character class and a quantifier. “\w” says match a word character, and “+” says match one or more. So when combined, the pattern says “match one or more word characters.” Now the pattern will match any word in place of “wallet”. E.g. “I lost my sablefish”, “I lost my parrot”, but it will not match “I lost my: trooper”, because as soon as the expression finds the ":" character, which is not a word character, it will stop matching. If we want the expression to be able to handle this situation, then we need to make a small change.
"I lost my:? \\w+"
Now the expression will allow an optional ":" directly after the word ‘my’.
Try this example online with our Visual Java Regex Tester.

2.2. Basic Grouping

An important feature of regular expressions is the ability to group sections of a pattern, and provide alternate matches.
|   	Alternation (or’ statement)
()  	Grouping
These two meta-characters are core parts of flexible regular expressions. For instance, in the first example we lost our wallet. What if we knew exactly which types of objects we had lost, and we wanted to find those objects but nothing else? We can use a group (), with an ‘or’ meta-character in order to specify a list of expressions to allow in our match.
"I lost my:? (wallet|car|cell phone|marbles)"
The new expression will now match the beginning of the string “I lost my”, an optional ":", and then any one of the expressions in the group, separated by alternators, "|"; any one of the following: ‘wallet’, ‘cell phone’, ‘car’, or our ‘marbles’ would be a match.
"I lost my wallet"		matches
"I lost my wallets"		matches		the ‘s’ is not needed, is ignored
"I lost my: car"		matches
"I lost my- car"		doesn’t match	‘-‘ is not allowed in our pattern
"I lost my: cell"		doesn’t match	all of ‘cell phone’ is needed
"I lost my: cell phone"	matches
"I lost my cell phone"		matches
"I lost my marbles"		matches
Try this example online with our Visual Java Regex Tester
As you can see, the combinations for matches quickly become very large. This is not the complete set, as there are several more phrases that would match our simple pattern.

Quiz: Can you figure out all possible matches for this pattern? (See the answers.)

"I lost my:? (wallet|car|cell phone|marbles)"

Answer: This is a trick question! Because this regular expression is unlimited (has no beginning `^` and no ending `$` meta-characters to terminate the match,) the pattern we’ve created will actually match any string containing one of the results below. In short, nearly infinite possible matches; however, if we did want to limit our pattern to just these results, we could use add the required terminators to our pattern – like so:

"^I lost my:? (wallet|car|cell phone|marbles)$"
"I lost my wallet"
"I lost my wallets"
"I lost my: wallet"
"I lost my: wallets"
"I lost my car"
"I lost my car"
"I lost my: car"
"I lost my: car"
"I lost my cell phone"
"I lost my cell phone"
"I lost my: cell phone"
"I lost my: cell phone"
"I lost my marbles"
"I lost my marbles"
"I lost my: marbles"
"I lost my: marbles"

2.3. Matching/Validating

Regular expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found/not found. (This can be used to validate input such as phone numbers, social security numbers, email addresses, web form input data, scrub data, and much more. Eg. If the pattern is found in a String, and the pattern matches a SSN, then the string is an SSN)
Sample code
import java.util.ArrayList;
import java.util.List;

public class ValidateDemo {
	public static void main(String[] args) {
		List<String> input = new ArrayList<String>();
		input.add("123-45-6789");
		input.add("9876-5-4321");
		input.add("987-65-4321 (attack)");
		input.add("987-65-4321 ");
		input.add("192-83-7465");


		for (String ssn : input) {
			if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$")) {
				System.out.println("Found good SSN: " + ssn);
			}
		}
	}
}
This produces the following output:
Found good SSN: 123-45-6789</br>
Found good SSN: 192-83-7465
Try this example online with our Visual Java Regex Tester
Dissecting the pattern:
"^(\\d{3}-?\\d{2}-?\\d{4})$"
^		match the beginning of the line
() 		group everything within the parenthesis as group 1
\d{n}		match n digits, where n is a number equal to or greater than zero
-?		optionally match a dash
$		match the end of the line
<

2.4. Extracting/Capturing

Specific values can be selected out of a large complex body of text. These values can be used in the application.
Sample code
import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;

public class ExtractDemo {
	public static void main(String[] args) {
		String input = "I have a cat, but I like my dog better.";

		Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)");
		Matcher m = p.matcher(input);

		List<String> animals = new ArrayList<String>();
		while (m.find()) {
			System.out.println("Found a " + m.group() + ".");
			animals.add(m.group());
		}
	}
}
This produces the following output:
Found a cat.
Found a dog.
Try this example online with our Visual Java Regex Tester
Dissecting the pattern:
"(mouse|cat|dog|wolf|bear|human)"
()		group everything within the parenthesis as group 1
mouse		match the text ‘mouse’
|		alternation: match any one of the sections of this group
cat		match the text ‘cat’
 
//...and so on

2.5. Modifying/Substitution

Values in text can be replaced with new values, for example, you could replace all instances of the word ‘clientId=’, followed by a number, with a mask to hide the original text. (See below) For sanitizing log files, URI strings and parameters, and form data, this can be a useful method of filtering sensitive information. A simple, reusable utility class can be used to encapsulate this into a more streamlined method.
Sample code
import java.util.regex.*;

public class ReplaceDemo {
	public static void main(String[] args) {
		String input = 
                  "User clientId=23421. Some more text clientId=33432. This clientNum=100";

		Pattern p = Pattern.compile("(clientId=)(\\d+)");
		Matcher m = p.matcher(input);

		StringBuffer result = new StringBuffer();
		while (m.find()) {
			System.out.println("Masking: " + m.group(2));
			m.appendReplacement(result, m.group(1) + "***masked***");
		}
		m.appendTail(result);
		System.out.println(result);
	}
}
This produces the following output:
Masking: 23421
Masking: 33432
User clientId=***masked***. Some more text clientId=***masked***. This clientNum=100.
Try this example online with our Visual Java Regex Tester
Dissecting the pattern:
"(clientId=)(\\d+)"
(clientId=) 	group everything within the parenthesis as group 1
clientId=	match the text ‘clientId=(\\d+)		group everything within the parenthesis as group 2
\\d+		match one or more digits

Notice how groups begin numbering at 1, and increment by one for each new group. However; groups may contain groups, in which case the outer group begins at one, group two will be the next inner group. When referencing group 0, you will be given the entire chunk of text that matched the regex.

(  ( ) (  ( ) ( ))) ( )	//and so on
 1  2   3  4   5     6		//0 = everything the pattern matched

3. Conclusion & Next Steps

Wrapping up, regular expressions are not difficult to master – in fact, they are quite easy. My strategy, whenever building a new regular expression, is to start with the simplest, most general match possible. From there, I continuously add more and more complexity until I have matched, substituted, or inserted exactly what I need.

Don’t be afraid to “express” yourself! When you’ve got the hang of these techniques, or need something a little fancier, read part two for more information on lookaheads, lookbehinds, and configuring the matching engine.

Lincoln Baxter, III

About the author:

Lincoln Baxter, III is a Principal Software Engineer at Red Hat, working on JBoss open-source projects; most notably as creator & project lead of JBoss Forge, and author of Errai UI. This blog represents his personal thoughts and perspectives, not necessarily those of his employer.

He is a founder of OCPsoft, the author of PrettyFaces and Rewrite, the leading URL-rewriting extensions for Servlet, Java EE, and Java web frameworks; he is also the author of PrettyTime, social-style date and timestamp formatting for Java. When he is not swimming, running, or playing Ultimate Frisbee, Lincoln is focused on promoting open-source software and making web-applications more accessible for small businesses, individuals.

Posted in OpenSource

54 Comments

  1. [...] provides tutorial examples in the hopes of spreading love for our pattern-matching friends. (Read part one.) Part 2: Look-ahead & Configuration flagsHave you ever wanted to find something in a string, [...]

  2. Gene De Lisa says:

    Good job. Readable. Understandable. Clear examples. No unnecessary chest beating.

    Keep it up.

  3. [...] and look-behind operations use syntax that could be confused with grouping (See Ch. 1 – Basic Grouping,) but these patterns do not capture values; therefore, using these constructs, no values will be [...]

  4. Nitin Gautam says:

    Thanks

  5. param says:

    I have to write a regular expression in java for the following test case:

    /**
    * Test for category 2000, state 11 in a state mask
    */
    public void testStateMask1() {
        String regex = RegexTrainer.stateMask1;
        try {
          assertTrue(regex != null && regex.length() > 0);
          assertTrue("Didn't match 2000011", 
                  Pattern.matches(regex, "2000011"));
          assertTrue("Didn't match 19000012000011", 
                  Pattern.matches(regex, "19000012000011"));
          assertTrue("Didn't match 190000120000112100001", 
                  Pattern.matches(regex, "190000120000112100001"));
          assertFalse("Matched 2000010", Pattern.matches(regex, "2000010"));
          assertFalse("Matched 010000112000011300001", 
                  Pattern.matches(regex, "010000112000011300001"));
        } catch(Exception e) {
          logger.error(e);
          fail();
        }
      }

    I have written the following regular expression but its working only for assertTrue

    /**
    * Test for category 2000, state 11 in a state mask
    */
    public static String stateMask1 = "^((\\d{7}){1,3})$";
  6. Hi there,

    Your regex matches a group of Seven(7) digits, between 1 and 3 times. This means that it matches Seven, Fourteen, or Twenty-one digits in a row, so all of the match operations you have listed above will succeed.

    It does nothing other than that. I’m not exactly sure what you need it to do based on your description of the problem.

    I hope this helps,
    ~Lincoln

  7. [...] Lincoln Baxter III Admin Also, that EL pattern is correct. Square brackets in regular expressions denote custom character classes. See reference: (http://ocpsoft.com/opensource/guide-to-regular-expressions-in-java-part-1/#charclasses) [...]

  8. Ygor Fonseca says:

    Hi, i would like to make a pattern that attend this:

    /event/anything/eventId/

    Ex: /event/coldplay-18-03-2012-new-york/123/

    1. Sounds like you want to do some URL-rewriting? There are a few libraries out there to do this:

      http://ocpsoft.com/prettyfaces/
      http://ocpsoft.com/rewrite/

      But if you just want a regular expression to match this type of URL, the following should do the trick:

      String pattern = "/event/[^/]+/\\d+/";
  9. MrBCut says:

    Hi LBIII

    Thanks so much for this tut. Best one out there! Regex can be a tricky concept to get (and I bet explain), so I am very much appreciative for your help! I appreciate your efforts wholeheartedly!

    Bookmarked! lol

    1. You’re welcome! Glad you found it useful!

  10. Paul says:

    Hi,

    Really cool stuff.

    Just to say it, but I think that there is a difference between #matches and #find method since the #matches method will always try to match the whole input while #find method will do it for any part of the input. You could say that #matches is like a #find but adding “^” and “$” char at the beginning and the end of the input.

    So, in the sample 2.3 :

    if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$"))

    “^” and “$” char are not required. And in 2.2 sample, the input “I lost my wallets” will only matches for the #find method, not for the #matches one (all other examples are fine with both methods).

    Anyone correct me if I’m wrong.

  11. TestConfig says:

    Hello Lincoln Baxter III,

    Very nice example, simple to understand :).
    Great work, keep it up.

  12. vaidhyanathan says:

    Hi,

    I want to write a password pattern to match 3 out of below 4 criteria’s a :-
    1) Password must contain atleast 1 numeric value
    2) Password must contain atleast 1 lower-case letter
    3) Password must contain atleast 1 upper-case letter
    4) Password must contain atleast 1 of these special characters !”#$%&’()*+,./;:=?_@>-
    Please help me in implementing the same

    1. While it would seem tempting to implement this using a single regular expression (which is certainly possible), I would recommend splitting this up into 4 individual checks, with unit tests for each check.

      In this situation, clarity should be preferred over brevity, and the regular expression you want to construct will be a bit opaque if you attempt a one-liner. Performance is not really an issue for something like this (unless you have some strange requirements or expectations:

      This is really pretty easy, so I’ll give you this code under one condition – you have to post a link on a blog back to this article! ;)

      public boolean passwordValidates( String pass ) {
         int count = 0;
      
         if( pass.matches(".*\\d.*") )
            count ++;
         if( pass.matches(".*[a-z].*") )
            count ++;
         if( pass.matches(".*[A-Z].*") )
            count ++;
         if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )
            count ++;
      
         return count >= 3;
      }
    1. vaidhyanathan says:

      Thanks for the above code.There is one scenario where the above code will break,If 3 criterias are match and there is a special character entered that is not in the allowed list this code will return true. For eg:- aA123@~ – 3 criteria are matched but ‘~’ is not in the allowed list ,As per the requirement if 3 criteria’s are matched and if any special character is entered and is present outside the boundary then it shoud not be allowed. Do you have snippet for this scenario

      1. vaidhyanathan says:

        Basically,Im looking if there’s a regular expression for finding the blacklisted special characters.

      2. vaidhyanathan says:

        I just tried this one ,it works fine (not tested all the cases).If you have a better approach please suggest me .I would sure post a link back :).

        public boolean passwordValidates( String pass ) {
           int count = 0;
           boolean pattern=true;
           if( pass.matches(".*\\d.*") )
              count ++;
           if( pass.matches(".*[a-z].*") )
              count ++;
           if( pass.matches(".*[A-Z].*") )
              count ++;
           if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )
              count ++;
        	if (count == 3)
        	{
        		pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$");
        		return pattern;
        	}
        
        	if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )//Check if there's a special character
        	{
        	  pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$");
        	  count ++;
        	}
           return (count >= 3&&pattern);
        }
  13. Steve says:

    Thanks, Great examples!! : )

    PS: would you be able to add ‘boundary matchers’ (?) to your syntax section and some supporting examples to suit by any chance, I’m reasonably new at this and was reading about them here (url below) but your example format is more comprehensive and much easier to understand (it doesn’t leave anything out : )
    http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

    Keep up the good work!

    1. Hey Steve :)

      You bet! I’ll put something up over the weekend – it should be no problem.

      Thanks for the motivation!
      ~Lincoln

    2. Taking a little longer than I thought :) had a fun weekend though!

  14. Lincoln A Baxter (Lincoln's Dad) says:

    I would suggest that references be provided to native regular expression man pages. I would think that the way to really understand regex’s would be to understand native regex’s as they would be used in sed or egrep or other standard Unix utilities, and then understand what the java library limitations are if any. I note that just about the first thing you do in part one, is talk about the escaping for \ backslash in pattern definition strings. I don’t know if there would be a way to do this, given that it appears that operator overloads appear to be not possible in Java, but I think a useful capability would be a java library/module (like prettytime) that linguistically overloads say the tick (single quote) or slash character so that regular expressions could be defined, and easier to read in java and consistent with other non-java examples. (cf: the perl slash (/pattern/) regex delimiter) which is the same as that used by standard Unix utilies like sed. I would think this would make generic regex man pages much more useful to the java user, increase the readability of regex’s in java, and as a result, maybe increase the general understanding and suffistication of regex usage by java programers.

    1. Unfortunately, operator overloads are not possible in Java, and there is no way to override the default behavior of the escape character in string literals, but I agree, it would be nice :)

      Since this is a Java-targeted article, I did link to http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html – the official regex docs. I think linking to man pages here would end up being confusing because the syntax is subtly different regarding escaping and configuration.

      In general, Java regexps are a full implementation of Unix regex, but not as comprehensive as say, PCRE in a few ways. This can, however, be made up for via programmatic usage of the Pattern and Matcher classes.

  15. JP says:

    Great article. By the way String.matches() is another quick way to apply regular expression in Java.

  16. Paraj says:

    Hi,
    I have a string like "John’s classes are nice || john teach nicely".

    when this string value passes to Jsp, where I use single quote ‘, breaks my page.

    I want to write an expression which add \ before ‘ i.e. "John\’s classes are nice \|\| john teach nicely"

    Can you please help me in this.

  17. rosy says:

    I am new to the RegEx. Is there anyway to write a pattern which says that it is a start of a group.
    987 x. 5,6(anything) (xyz). can we write a pattern which says that start of group is (x.)

  18. Rachel says:

    I need a regular expression for finding a missing parenthesis within text. I have the following that finds a missing end parenthesis: \((?!.+?\)), but not matter what I try, I can’t figure out how to find the missing beginning one. Got any ideas?

    1. Honestly, regular expressions are probably not the best solution here. You’re probably better off using a cursor-based matching algorithm.

      Iterate over the array and keep a count of how many parens you encounter. It’s hard for a regex to do this because the regex engine doesn’t have the concept of keeping count of occurrences (at least that you can access.)

      1. Rachel says:

        Thanks and I understand it’s not the best. But, is there a possible reverse expression that could be worked out (reverse of the one I gave in my original question)? I would think you could do a negative look behind. I just couldn’t figure out where to put the parentheses in the equation. Is it possible?

  19. Sachin says:

    Hi Lincoln,
    Great Information.

    I want to do something reverse of this. I have to make my password rule to be configure by user through property file with RegEx value, then i have to validate the password value against configured RegEx. I am success ed to validate it but now i have to also show, what is correct password format to the user so that he can correct it accordingly, how can i parse regex & find that it looks for n number of special char, n number of upper case alphabet, n number of numeric character?

    Thanks,
    -Sachin.

    1. I don’t think you really want to try to parse the regex. For passwords, I generally find that using several separate regexes in independent if() blocks -instead of one big regex- allows for more control of the parsing and error reporting.

      1. Sachin says:

        Thanks for the reply. It make sense to split it in multiple groups to have better control on error reporting. But still i would give a try, may be i can put a constraint on configuration, that RegEx need to have in specific sequence of groups & then i can split this string & find the information inside these groups to form the dynamic error message.

        If it won’t work then i will go with your suggestion.

        Thanks,
        -Sachin.

  20. Ria says:

    Hi, can you give me a java code for my assignment. for example the input is Regular expression Alphabet then the output will be the set of string generated by regular expression over the alphabet.
    e.g. input alphabet {a,b} regular expression : b+ output : b, bb,bbb, bbbb….
    or the other e.g
    input: (a || b) b+
    output: {a, b)

    1. Sorry, I won’t help with homework, but you can try out our visul java regex tester, which might help you out:

      http://ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/

  21. Mayur says:

    if i want to detect some characters those are inside round bracket,excludes round bracket..
    How to perform this operation?

    1. Try this:

      \\\\([^)]+\\\\)

      Notice how you need to actually quadruple escape the braces, this is because the first `\\` creates a single backslash in the regex engine, which creates a meta-character, then the second `\\` escapes the backslash and creates a regex escape meta-character. Confusing, I know.

  22. Simon says:

    i have a List as follows:

    List<String> x = new ArrayList<String>();
    x.add("Date : Jul 15, 2010 Income : 8500 Expenses : 0");
    x.add("Date : Aug 23, 2010 Income : 0 Expenses : 6500");
    x.add("Date : Jul 15, 2010 Income : 0 Expenses : 4500");

    i now want to access these indexes as follows:

    int index1 = x.indexOf("Date : Aug 23, 2010");
    //1
    
    int index2 = x.indexOf("Date : Jul 15, 2010");
    //0
    
    int index3 = x.indexOf("Date : Jul 15, 2010");
    //2

    Any help? Thank’s in advance. Am also a Java enthusiast and Open Source Developer.

  23. Simon says:

    I made a mistake. It should be:

    int index3 = x.lastIndexOf("Date : Jul 15, 2010");
    //2
    1. I’m not really sure what I mean by "access the index" – If you want to find indexes, then the approach you used works fine. However, if what I think you are really asking is to find the date in each element, then you could do something like this:

      List<String> x = new ArrayList<String>();
      x.add("Date : Jul 15, 2010 Income : 8500 Expenses : 0");
      x.add("Date : Aug 23, 2010 Income : 0 Expenses : 6500");
      x.add("Date : Jul 15, 2010 Income : 0 Expenses : 4500");
      Date : (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, \d{4}

      I used our Regex tester to figure this out: http://ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/

  24. navneet singh says:

    Try & find regular expression for 3 text pattern given below
    1.) 404-5674
    2.) 080 banaglore
    3.) Cloudjini Technology

    1. Huh? I’m not understanding your question. I’ll have to consider this SPAM, and delete it unless you clarify :) Thanks!

  25. Carla says:
    Pattern pattern=Pattern.compile(p.content);
    		Matcher matcher=pattern.matcher("(January|February|March|April|May|June|July|August|September|October|November|December)\\s([1-9]|[1-2]\\d|3[0-1]),\\s(\\d{3,4})");
    if(matcher.find())
    String dateString=matcher.group(0);

    Here is my java code to match a certain date format in a sentence with this regular expression:

    (January|February|March|April|May|June|July|August|September|October|November|December)\s([1-9]|[1-2]\d|3[0-1]),\s(\d{3,4})

    Here is an example of the sentence i’m testing it on:
    Elvis Aaron Presley (January 8, 1935 – August 16, 1977) was one of first and most famous American rock and roll’s superstars. His fame lasted long after his death. He was also an actor who starred in many movie

    I tested it with many online testers and it works here is the result:
    start() = 21, end() = 36
    group(0) = "January 8, 1935"
    group(1) = "January"
    group(2) = "8"
    group(3) = "1935

    but in java matcher.find() the result is false using this same regex

    Please help me know why it isn’t working in java

    1. First, I think you are using Pattern and Matcher backwards. I think you should be doing this instead:

      @Test
         public void test()
         {
            String string = "Elvis Aaron Presley (January 8, 1935 – August 16, 1977) was one of first and most famous American rock and roll’s superstars. His fame lasted long after his death. He was also an actor who starred in many movie";
            Pattern pattern = Pattern
                     .compile("(January|February|March|April|May|June|July|August|September|October|November|December)\\s([1-9]|[1-2]\\d|3[0-1]),\\s(\\d{3,4})");
            Matcher matcher = pattern.matcher(string);
            if (matcher.find())
            {
               String dateString = matcher.group(0);
               System.out.println(dateString);
            }
         }

      Also, have you considered using something like http://ocpsoft.org/prettytime/nlp/ PrettyTime NLP parser to do your date parsing? It might be more flexible than using Regexes for this, if you are trying to match many variants of dates.

      If this helped, please consider sharing this page on twitter/google/facebook! Thanks!

  26. HC says:

    Hello, I am trying to parse two kind of files using java regex. But not able find a suitable pattern. Need your help urgently.

    (1) Regular file has the text as below:

    So I'll show you where I keep them now, its all in this cupboard here 00.00.20

    I need group the text & timestamp separately.

    (2) SRT format to be read. The text is as below

    1
    00:00:10,500 --> 00:00:13,000
    Elephant's Dream
    
    2
    00:00:15,000 --> 00:00:18,000
    At the left we can see

    Need to basically read (1) and convert to type (2). If the file is already of type (2) take no action.

  27. nishan says:

    how to get the cause of mismatch while regex holds more than one condition to chek

  28. Alp says:

    Thank you for this page that nicely tells how to use regex, in a very understandable fashion.

    Cheers :)

  29. Niranjan says:

    I had never grasped Regex fully well in my college days and this tutorial helped me finally do that. Shows how much it is equally important to formulate tutorials perfectly — without superfluous notations, to the point and neat!

    Looking forward to other such tutorials. Thank you for this tutorial!

  30. Amak Yunus says:

    Good Job. I Like it

  31. Gianluca Elmo says:

    Very interesting and usefull tutorial.

    Thanks a lot!

    PS
    I have been trying to register for the forum. But until today I have not received any confirmation email yet. I would appreciate if you could help me with that. Thanks.

    1. Glad you like the tutorial and thanks for letting us know! Sent you an email.

  32. Muhadira says:

    hi, i need help.
    i want to take two words after ‘at’ word,
    for example, string: "I was at my uncle’s house with my brother".
    how can i get just "my uncle’s house"?

Leave a Comment




Please note: In order to submit code or special characters, wrap it in

[code lang="xml"][/code]
(for your language) - or your tags will be eaten.

Please note: Comment moderation is enabled and may delay your comment from appearing. There is no need to resubmit your comment.