July 22nd, 2013 by

Lincoln Baxter III

Guide to Regular Expressions in Java (Part 1)

Often unknown, or heralded as confusing, regular expressions (regex) have defined the standard for powerful text manipulation and search. Without them, many of the applications we know today would not function. This two-part series explores the basics of regular expressions in Java, and provides tutorial examples in the hopes of spreading love for our pattern-matching friends. (Read part two.)

Part 1: What are Regular Expressions?

Regular expressions are a language of string patterns built in to most modern programming languages, including Java 1.4 onward; they can be used for: searching, extracting, and modifying text. This chapter will cover basic syntax and use.

This article is part one in the series: “[[Regular Expressions]].” Read part two for more information on lookaheads, lookbehinds, and configuring the matching engine. To get a more visual look into how regular expressions work, try our visual java regex tester. You can also watch a video to see how the visual regex tester works.

1. Syntax

Regular expressions, by definition, are string patterns that describe text. These descriptions can then be used in nearly infinite ways. The basic language constructs include character classes, quantifiers, and meta-characters.

1.1. Character Classes

Character classes are used to define the content of the pattern. E.g. what should the pattern look for?

.  	Dot, any character (may or may not match line terminators, read on)
\d  	A digit: [0-9]
\D  	A non-digit: [^0-9]
\s  	A whitespace character: [ \t\n\x0B\f\r]
\S  	A non-whitespace character: [^\s]
\w  	A word character: [a-zA-Z_0-9]
\W  	A non-word character: [^\w]

However; notice that in Java, you will need to “double escape” these backslashes.

String pattern = "\\d \\D \\W \\w \\S \\s";

1.2. Quantifiers

Quantifiers can be used to specify the number or length that part of a pattern should match or repeat. A quantifier will bind to the expression group to its immediate left.

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

1.3. Meta-characters

Meta-characters are used to group, divide, and perform special operations in patterns.

\   	Escape the next meta-character (it becomes a normal/literal character)
^   	Match the beginning of the line
.   	Match any character (except newline)
$   	Match the end of the line (or before newline at the end)
|   	Alternation (‘or’ statement)
()  	Grouping
[]  	Custom character class

Visual Regex Tester

To get a more visual look into how regular expressions work, try our visual java regex tester.

2. Examples

2.1. Basic Expressions

Every string is a regular expression. For example, the string, “I lost my wallet”, is a regular expression that will match the text, “I lost my wallet”, and will ignore everything else. What if we want to be able to find more things that we lost? We can replace wallet with a character class expression that will match any word.

"I lost my \\w+"

As you can see, this pattern uses both a character class and a quantifier. “\w” says match a word character, and “+” says match one or more. So when combined, the pattern says “match one or more word characters.” Now the pattern will match any word in place of “wallet”. E.g. “I lost my sablefish”, “I lost my parrot”, but it will not match “I lost my: trooper”, because as soon as the expression finds the ":" character, which is not a word character, it will stop matching. If we want the expression to be able to handle this situation, then we need to make a small change.

"I lost my:? \\w+"

Now the expression will allow an optional ":" directly after the word ‘my’.

Try this example online with our Visual Java Regex Tester.

2.2. Basic Grouping

An important feature of regular expressions is the ability to group sections of a pattern, and provide alternate matches.

|   	Alternation (‘or’ statement)
()  	Grouping

These two meta-characters are core parts of flexible regular expressions. For instance, in the first example we lost our wallet. What if we knew exactly which types of objects we had lost, and we wanted to find those objects but nothing else? We can use a group (), with an ‘or’ meta-character in order to specify a list of expressions to allow in our match.

"I lost my:? (wallet|car|cell phone|marbles)"

The new expression will now match the beginning of the string “I lost my”, an optional ":", and then any one of the expressions in the group, separated by alternators, "|"; any one of the following: ‘wallet’, ‘cell phone’, ‘car’, or our ‘marbles’ would be a match.

"I lost my wallet"		matches
"I lost my wallets"		matches		the ‘s’ is not needed, is ignored
"I lost my: car"		matches
"I lost my- car"		doesn’t match	‘-‘ is not allowed in our pattern
"I lost my: cell"		doesn’t match	all of ‘cell phone’ is needed
"I lost my: cell phone"	matches
"I lost my cell phone"		matches
"I lost my marbles"		matches

Try this example online with our Visual Java Regex Tester

As you can see, the combinations for matches quickly become very large. This is not the complete set, as there are several more phrases that would match our simple pattern.

Quiz: Can you figure out all possible matches for this pattern? (See the answers.)

"I lost my:? (wallet|car|cell phone|marbles)"

Answer: This is a trick question! Because this regular expression is unlimited (has no beginning `^` and no ending `$` meta-characters to terminate the match,) the pattern we’ve created will actually match any string containing one of the results below. In short, nearly infinite possible matches; however, if we did want to limit our pattern to just these results, we could use add the required terminators to our pattern – like so:

"^I lost my:? (wallet|car|cell phone|marbles)$"

"I lost my wallet"
"I lost my wallets"
"I lost my: wallet"
"I lost my: wallets"
"I lost my car"
"I lost my car"
"I lost my: car"
"I lost my: car"
"I lost my cell phone"
"I lost my cell phone"
"I lost my: cell phone"
"I lost my: cell phone"
"I lost my marbles"
"I lost my marbles"
"I lost my: marbles"
"I lost my: marbles"

2.3. Matching/Validating

Regular expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found/not found. (This can be used to validate input such as phone numbers, social security numbers, email addresses, web form input data, scrub data, and much more. Eg. If the pattern is found in a String, and the pattern matches a SSN, then the string is an SSN)

Sample code

import java.util.ArrayList;
import java.util.List;

public class ValidateDemo {
	public static void main(String[] args) {
		List<String> input = new ArrayList<String>();
		input.add("123-45-6789");
		input.add("9876-5-4321");
		input.add("987-65-4321 (attack)");
		input.add("987-65-4321 ");
		input.add("192-83-7465");


		for (String ssn : input) {
			if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$")) {
				System.out.println("Found good SSN: " + ssn);
			}
		}
	}
}

This produces the following output:

Found good SSN: 123-45-6789</br>
Found good SSN: 192-83-7465
Found good SSN: 123-45-6789</br> Found good SSN: 192-83-7465

Try this example online with our Visual Java Regex Tester

Dissecting the pattern:

"^(\\d{3}-?\\d{2}-?\\d{4})$"

^		match the beginning of the line
() 		group everything within the parenthesis as group 1
\d{n}		match n digits, where n is a number equal to or greater than zero
-?		optionally match a dash
$		match the end of the line

2.4. Extracting/Capturing

Specific values can be selected out of a large complex body of text. These values can be used in the application.

Sample code

import java.util.ArrayList;
import java.util.List;
import java.util.regex.*;

public class ExtractDemo {
	public static void main(String[] args) {
		String input = "I have a cat, but I like my dog better.";

		Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)");
		Matcher m = p.matcher(input);

		List<String> animals = new ArrayList<String>();
		while (m.find()) {
			System.out.println("Found a " + m.group() + ".");
			animals.add(m.group());
		}
	}
}

This produces the following output:

Found a cat.
Found a dog.
Found a cat. Found a dog.

Try this example online with our Visual Java Regex Tester

Dissecting the pattern:

"(mouse|cat|dog|wolf|bear|human)"

()		group everything within the parenthesis as group 1
mouse		match the text ‘mouse’
|		alternation: match any one of the sections of this group
cat		match the text ‘cat’
 
//...and so on

2.5. Modifying/Substitution

Values in text can be replaced with new values, for example, you could replace all instances of the word ‘clientId=’, followed by a number, with a mask to hide the original text. (See below) For sanitizing log files, URI strings and parameters, and form data, this can be a useful method of filtering sensitive information. A simple, reusable utility class can be used to encapsulate this into a more streamlined method.

Sample code

import java.util.regex.*;

public class ReplaceDemo {
	public static void main(String[] args) {
		String input = 
                  "User clientId=23421. Some more text clientId=33432. This clientNum=100";

		Pattern p = Pattern.compile("(clientId=)(\\d+)");
		Matcher m = p.matcher(input);

		StringBuffer result = new StringBuffer();
		while (m.find()) {
			System.out.println("Masking: " + m.group(2));
			m.appendReplacement(result, m.group(1) + "***masked***");
		}
		m.appendTail(result);
		System.out.println(result);
	}
}

This produces the following output:

Masking: 23421
Masking: 33432
User clientId=***masked***. Some more text clientId=***masked***. This clientNum=100.
Masking: 23421 Masking: 33432 User clientId=***masked***. Some more text clientId=***masked***. This clientNum=100.

Try this example online with our Visual Java Regex Tester

Dissecting the pattern:

"(clientId=)(\\d+)"

(clientId=) 	group everything within the parenthesis as group 1
clientId=	match the text ‘clientId=’
(\\d+)		group everything within the parenthesis as group 2
\\d+		match one or more digits

Notice how groups begin numbering at 1, and increment by one for each new group. However; groups may contain groups, in which case the outer group begins at one, group two will be the next inner group. When referencing group 0, you will be given the entire chunk of text that matched the regex.

(  ( ) (  ( ) ( ))) ( )	//and so on
 1  2   3  4   5     6		//0 = everything the pattern matched

3. Conclusion & Next Steps

Wrapping up, regular expressions are not difficult to master – in fact, they are quite easy. My strategy, whenever building a new regular expression, is to start with the simplest, most general match possible. From there, I continuously add more and more complexity until I have matched, substituted, or inserted exactly what I need.

Don’t be afraid to “express” yourself! When you’ve got the hang of these techniques, or need something a little fancier, read part two for more information on lookaheads, lookbehinds, and configuring the matching engine.

Posted in OpenSource

/** * Test for category 2000, state 11 in a state mask */ public void testStateMask1() { String regex = RegexTrainer.stateMask1; try { assertTrue(regex != null && regex.length() > 0); assertTrue("Didn't match 2000011", Pattern.matches(regex, "2000011")); assertTrue("Didn't match 19000012000011", Pattern.matches(regex, "19000012000011")); assertTrue("Didn't match 190000120000112100001", Pattern.matches(regex, "190000120000112100001")); assertFalse("Matched 2000010", Pattern.matches(regex, "2000010")); assertFalse("Matched 010000112000011300001", Pattern.matches(regex, "010000112000011300001")); } catch(Exception e) { logger.error(e); fail(); } }

public boolean passwordValidates( String pass ) { int count = 0; if( pass.matches(".*\\d.*") ) count ++; if( pass.matches(".*[a-z].*") ) count ++; if( pass.matches(".*[A-Z].*") ) count ++; if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") ) count ++; return count >= 3; }

public boolean passwordValidates( String pass ) { int count = 0; boolean pattern=true; if( pass.matches(".*\\d.*") ) count ++; if( pass.matches(".*[a-z].*") ) count ++; if( pass.matches(".*[A-Z].*") ) count ++; if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") ) count ++; if (count == 3) { pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$"); return pattern; } if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )//Check if there's a special character { pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$"); count ++; } return (count >= 3&&pattern); }

List<String> x = new ArrayList<String>(); x.add("Date : Jul 15, 2010 Income : 8500 Expenses : 0"); x.add("Date : Aug 23, 2010 Income : 0 Expenses : 6500"); x.add("Date : Jul 15, 2010 Income : 0 Expenses : 4500");

Pattern pattern=Pattern.compile(p.content); Matcher matcher=pattern.matcher("(January|February|March|April|May|June|July|August|September|October|November|December)\\s([1-9]|[1-2]\\d|3[0-1]),\\s(\\d{3,4})"); if(matcher.find()) String dateString=matcher.group(0);

@Test public void test() { String string = "Elvis Aaron Presley (January 8, 1935 – August 16, 1977) was one of first and most famous American rock and roll’s superstars. His fame lasted long after his death. He was also an actor who starred in many movie"; Pattern pattern = Pattern .compile("(January|February|March|April|May|June|July|August|September|October|November|December)\\s([1-9]|[1-2]\\d|3[0-1]),\\s(\\d{3,4})"); Matcher matcher = pattern.matcher(string); if (matcher.find()) { String dateString = matcher.group(0); System.out.println(dateString); } }

How-To: Regular Expressions in Java (Part 2) - Tutorial | Examples | OcpSoft says:

January 11, 2010 at 9:55 am

[…] provides tutorial examples in the hopes of spreading love for our pattern-matching friends. (Read part one.) Part 2: Look-ahead & Configuration flagsHave you ever wanted to find something in a string, […]

[reply]

Gene De Lisa says:

March 2, 2010 at 8:32 am

Good job. Readable. Understandable. Clear examples. No unnecessary chest beating.

Keep it up.

December 13, 2010 at 12:45 pm

[…] and look-behind operations use syntax that could be confused with grouping (See Ch. 1 – Basic Grouping,) but these patterns do not capture values; therefore, using these constructs, no values will be […]

Nitin Gautam says:

March 5, 2011 at 8:04 pm

Thanks

param says:

October 28, 2011 at 2:29 am

I have to write a regular expression in java for the following test case:

I have written the following regular expression but its working only for assertTrue

/**
* Test for category 2000, state 11 in a state mask
*/
public static String stateMask1 = "^((d{7}){1,3})$";

Lincoln Baxter III says:

October 28, 2011 at 11:54 am

Hi there,

Your regex matches a group of Seven(7) digits, between 1 and 3 times. This means that it matches Seven, Fourteen, or Twenty-one digits in a row, so all of the match operations you have listed above will succeed.

It does nothing other than that. I’m not exactly sure what you need it to do based on your description of the problem.

I hope this helps,
~Lincoln

Move URL parameter position « OcpSoft Support Forums says:

January 6, 2012 at 3:10 pm

[…] http://ocpsoft.com/opensource/guide-to-regular-expressions-in-java-part-1/ Posted 1 year ago # […]

Question about redirect with pretty faces « OcpSoft Support Forums says:

January 6, 2012 at 3:13 pm

[…] are some regex tutorials: http://ocpsoft.com/opensource/guide-to-regular-expressions-in-java-part-1/ http://ocpsoft.com/opensource/guide-to-regular-expressions-in-java-part-2/ Posted 11 months ago […]

[solved] DynaView doesn't work with #{resource['x']} « OcpSoft Support Forums says:

January 6, 2012 at 5:48 pm

[…] Lincoln Baxter III Admin Also, that EL pattern is correct. Square brackets in regular expressions denote custom character classes. See reference: (http://ocpsoft.com/opensource/guide-to-regular-expressions-in-java-part-1/#charclasses) […]

Ygor Fonseca says:

January 27, 2012 at 11:06 am

Hi, i would like to make a pattern that attend this:

/event/anything/eventId/

Ex: /event/coldplay-18-03-2012-new-york/123/

Lincoln Baxter III says:

February 24, 2012 at 5:54 am

Sounds like you want to do some URL-rewriting? There are a few libraries out there to do this:

http://ocpsoft.com/prettyfaces/
http://ocpsoft.com/rewrite/

But if you just want a regular expression to match this type of URL, the following should do the trick:
String pattern = "/event/[^/]+/d+/";
String pattern = "/event/[^/]+/d+/";
[reply]

MrBCut says:

March 5, 2012 at 12:24 am

Hi LBIII

Thanks so much for this tut. Best one out there! Regex can be a tricky concept to get (and I bet explain), so I am very much appreciative for your help! I appreciate your efforts wholeheartedly!

Bookmarked! lol

Lincoln Baxter III says:

March 28, 2012 at 8:36 pm

You’re welcome! Glad you found it useful!

[reply]

Paul says:

March 29, 2012 at 1:57 pm

Hi,

Really cool stuff.

Just to say it, but I think that there is a difference between #matches and #find method since the #matches method will always try to match the whole input while #find method will do it for any part of the input. You could say that #matches is like a #find but adding “^” and “$” char at the beginning and the end of the input.

So, in the sample 2.3 :

if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$"))

“^” and “$” char are not required. And in 2.2 sample, the input “I lost my wallets” will only matches for the #find method, not for the #matches one (all other examples are fine with both methods).

Anyone correct me if I’m wrong.

TestConfig says:

August 29, 2012 at 6:06 am

Hello Lincoln Baxter III,

Very nice example, simple to understand :).
Great work, keep it up.

vaidhyanathan says:

August 29, 2012 at 1:23 pm

I want to write a password pattern to match 3 out of below 4 criteria’s a :-
1) Password must contain atleast 1 numeric value
2) Password must contain atleast 1 lower-case letter
3) Password must contain atleast 1 upper-case letter
4) Password must contain atleast 1 of these special characters !”#$%&'()*+,./;:=?_@>-
Please help me in implementing the same

Lincoln Baxter III says:

August 29, 2012 at 1:58 pm

While it would seem tempting to implement this using a single regular expression (which is certainly possible), I would recommend splitting this up into 4 individual checks, with unit tests for each check.

In this situation, clarity should be preferred over brevity, and the regular expression you want to construct will be a bit opaque if you attempt a one-liner. Performance is not really an issue for something like this (unless you have some strange requirements or expectations:

This is really pretty easy, so I’ll give you this code under one condition – you have to post a link on a blog back to this article! 😉
```
public boolean passwordValidates( String pass ) {
   int count = 0;

   if( pass.matches(".*\\d.*") )
      count ++;
   if( pass.matches(".*[a-z].*") )
      count ++;
   if( pass.matches(".*[A-Z].*") )
      count ++;
   if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )
      count ++;

   return count >= 3;
}
```
[reply]

August 30, 2012 at 12:36 pm

public class Test {}

vaidhyanathan says:

August 31, 2012 at 12:32 am

Thanks for the above code.There is one scenario where the above code will break,If 3 criterias are match and there is a special character entered that is not in the allowed list this code will return true. For eg:- aA123@~ – 3 criteria are matched but ‘~’ is not in the allowed list ,As per the requirement if 3 criteria’s are matched and if any special character is entered and is present outside the boundary then it shoud not be allowed. Do you have snippet for this scenario

[reply]
1. vaidhyanathan says:
  
  August 31, 2012 at 10:35 am
  
  Basically,Im looking if there’s a regular expression for finding the blacklisted special characters.
2. vaidhyanathan says:
  
  August 31, 2012 at 11:22 am
  
  I just tried this one ,it works fine (not tested all the cases).If you have a better approach please suggest me .I would sure post a link back :).
```
public boolean passwordValidates( String pass ) {
   int count = 0;
   boolean pattern=true;
   if( pass.matches(".*\\d.*") )
      count ++;
   if( pass.matches(".*[a-z].*") )
      count ++;
   if( pass.matches(".*[A-Z].*") )
      count ++;
   if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )
      count ++;
	if (count == 3)
	{
		pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$");
		return pattern;
	}

	if( pass.matches(".*[!”#$%&’()*+,./;:=?_@>-].*") )//Check if there's a special character
	{
	  pattern=pass.matches("^[a-zA-Z0-9\\s!\"#$%&'()*+,./;:=?_@>-]{8}$");
	  count ++;
	}
   return (count >= 3&&pattern);
}
```

Steve says:

August 30, 2012 at 9:51 pm

Thanks, Great examples!! : )

PS: would you be able to add ‘boundary matchers’ (?) to your syntax section and some supporting examples to suit by any chance, I’m reasonably new at this and was reading about them here (url below) but your example format is more comprehensive and much easier to understand (it doesn’t leave anything out : )
http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

Keep up the good work!

Lincoln Baxter III says:

August 30, 2012 at 10:15 pm

Hey Steve 🙂

You bet! I’ll put something up over the weekend – it should be no problem.

Thanks for the motivation!
~Lincoln

[reply]
Lincoln Baxter III says:

September 4, 2012 at 7:09 pm

Taking a little longer than I thought 🙂 had a fun weekend though!

[reply]

Lincoln A Baxter (Lincoln's Dad) says:

September 19, 2012 at 7:51 am

I would suggest that references be provided to native regular expression man pages. I would think that the way to really understand regex’s would be to understand native regex’s as they would be used in sed or egrep or other standard Unix utilities, and then understand what the java library limitations are if any. I note that just about the first thing you do in part one, is talk about the escaping for \ backslash in pattern definition strings. I don’t know if there would be a way to do this, given that it appears that operator overloads appear to be not possible in Java, but I think a useful capability would be a java library/module (like prettytime) that linguistically overloads say the tick (single quote) or slash character so that regular expressions could be defined, and easier to read in java and consistent with other non-java examples. (cf: the perl slash (/pattern/) regex delimiter) which is the same as that used by standard Unix utilies like sed. I would think this would make generic regex man pages much more useful to the java user, increase the readability of regex’s in java, and as a result, maybe increase the general understanding and suffistication of regex usage by java programers.

Lincoln Baxter III says:

September 19, 2012 at 12:42 pm

Unfortunately, operator overloads are not possible in Java, and there is no way to override the default behavior of the escape character in string literals, but I agree, it would be nice 🙂

Since this is a Java-targeted article, I did link to http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html – the official regex docs. I think linking to man pages here would end up being confusing because the syntax is subtly different regarding escaping and configuration.

In general, Java regexps are a full implementation of Unix regex, but not as comprehensive as say, PCRE in a few ways. This can, however, be made up for via programmatic usage of the Pattern and Matcher classes.

[reply]

JP says:

November 4, 2012 at 1:24 am

Great article. By the way String.matches() is another quick way to apply regular expression in Java.

Paraj says:

November 29, 2012 at 4:29 am

Hi,
I have a string like "John’s classes are nice || john teach nicely".

when this string value passes to Jsp, where I use single quote ‘, breaks my page.

I want to write an expression which add \ before ‘ i.e. "John\’s classes are nice \|\| john teach nicely"

Can you please help me in this.

rosy says:

December 4, 2012 at 4:53 am

I am new to the RegEx. Is there anyway to write a pattern which says that it is a start of a group.
987 x. 5,6(anything) (xyz). can we write a pattern which says that start of group is (x.)

Rachel says:

December 27, 2012 at 10:53 pm

I need a regular expression for finding a missing parenthesis within text. I have the following that finds a missing end parenthesis: $(?!.+?$), but not matter what I try, I can’t figure out how to find the missing beginning one. Got any ideas?

Lincoln Baxter III says:

December 28, 2012 at 2:36 pm

Honestly, regular expressions are probably not the best solution here. You’re probably better off using a cursor-based matching algorithm.

Iterate over the array and keep a count of how many parens you encounter. It’s hard for a regex to do this because the regex engine doesn’t have the concept of keeping count of occurrences (at least that you can access.)

[reply]
1. Rachel says:
  
  December 28, 2012 at 3:29 pm
  
  Thanks and I understand it’s not the best. But, is there a possible reverse expression that could be worked out (reverse of the one I gave in my original question)? I would think you could do a negative look behind. I just couldn’t figure out where to put the parentheses in the equation. Is it possible?

Sachin says:

January 1, 2013 at 4:06 am

Hi Lincoln,
Great Information.

I want to do something reverse of this. I have to make my password rule to be configure by user through property file with RegEx value, then i have to validate the password value against configured RegEx. I am success ed to validate it but now i have to also show, what is correct password format to the user so that he can correct it accordingly, how can i parse regex & find that it looks for n number of special char, n number of upper case alphabet, n number of numeric character?

Thanks,
-Sachin.

Lincoln Baxter III says:

January 1, 2013 at 12:43 pm

I don’t think you really want to try to parse the regex. For passwords, I generally find that using several separate regexes in independent if() blocks -instead of one big regex- allows for more control of the parsing and error reporting.

[reply]
1. Sachin says:
  
  January 1, 2013 at 11:03 pm
  
  Thanks for the reply. It make sense to split it in multiple groups to have better control on error reporting. But still i would give a try, may be i can put a constraint on configuration, that RegEx need to have in specific sequence of groups & then i can split this string & find the information inside these groups to form the dynamic error message.
  
  If it won’t work then i will go with your suggestion.
  
  Thanks,
  -Sachin.

Ria says:

March 2, 2013 at 8:51 am

Hi, can you give me a java code for my assignment. for example the input is Regular expression Alphabet then the output will be the set of string generated by regular expression over the alphabet.
e.g. input alphabet {a,b} regular expression : b+ output : b, bb,bbb, bbbb….
or the other e.g
input: (a || b) b+
output: {a, b)

Lincoln Baxter III says:

June 5, 2013 at 4:20 pm

Sorry, I won’t help with homework, but you can try out our visul java regex tester, which might help you out:

http://ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/

[reply]

Mayur says:

May 3, 2013 at 1:25 am

if i want to detect some characters those are inside round bracket,excludes round bracket..
How to perform this operation?

Lincoln Baxter III says:

June 5, 2013 at 4:22 pm

Try this:
```
\\\$[^)]+\\\$
```
Notice how you need to actually quadruple escape the braces, this is because the first `\\` creates a single backslash in the regex engine, which creates a meta-character, then the second `\\` escapes the backslash and creates a regex escape meta-character. Confusing, I know.

[reply]

Simon says:

June 5, 2013 at 10:40 am

i have a List as follows:

i now want to access these indexes as follows:

int index1 = x.indexOf("Date : Aug 23, 2010");
//1

int index2 = x.indexOf("Date : Jul 15, 2010");
//0

int index3 = x.indexOf("Date : Jul 15, 2010");
//2

Any help? Thank’s in advance. Am also a Java enthusiast and Open Source Developer.

June 5, 2013 at 10:43 am

I made a mistake. It should be:

int index3 = x.lastIndexOf("Date : Jul 15, 2010");
//2

Lincoln Baxter III says:

June 5, 2013 at 4:26 pm

I’m not really sure what I mean by "access the index" – If you want to find indexes, then the approach you used works fine. However, if what I think you are really asking is to find the date in each element, then you could do something like this:
```
List<String> x = new ArrayList<String>();
x.add("Date : Jul 15, 2010 Income : 8500 Expenses : 0");
x.add("Date : Aug 23, 2010 Income : 0 Expenses : 6500");
x.add("Date : Jul 15, 2010 Income : 0 Expenses : 4500");
```
```
Date : (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, \d{4}
```
I used our Regex tester to figure this out: http://ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/

[reply]

navneet singh says:

June 8, 2013 at 2:40 pm

Try & find regular expression for 3 text pattern given below
1.) 404-5674
2.) 080 banaglore
3.) Cloudjini Technology

Lincoln Baxter III says:

June 17, 2013 at 1:12 pm

Huh? I’m not understanding your question. I’ll have to consider this SPAM, and delete it unless you clarify 🙂 Thanks!

[reply]

Carla says:

June 18, 2013 at 9:57 am

Here is my java code to match a certain date format in a sentence with this regular expression:

(January|February|March|April|May|June|July|August|September|October|November|December)\s([1-9]|[1-2]\d|3[0-1]),\s(\d{3,4})

Here is an example of the sentence i’m testing it on:
Elvis Aaron Presley (January 8, 1935 – August 16, 1977) was one of first and most famous American rock and roll’s superstars. His fame lasted long after his death. He was also an actor who starred in many movie

I tested it with many online testers and it works here is the result:
start() = 21, end() = 36
group(0) = "January 8, 1935"
group(1) = "January"
group(2) = "8"
group(3) = "1935

but in java matcher.find() the result is false using this same regex

Please help me know why it isn’t working in java

June 20, 2013 at 12:03 pm

First, I think you are using Pattern and Matcher backwards. I think you should be doing this instead:

Also, have you considered using something like http://ocpsoft.org/prettytime/nlp/ PrettyTime NLP parser to do your date parsing? It might be more flexible than using Regexes for this, if you are trying to match many variants of dates.

If this helped, please consider sharing this page on twitter/google/facebook! Thanks!

HC says:

July 15, 2013 at 7:35 am

Hello, I am trying to parse two kind of files using java regex. But not able find a suitable pattern. Need your help urgently.

(1) Regular file has the text as below:

So I'll show you where I keep them now, its all in this cupboard here 00.00.20

I need group the text & timestamp separately.

(2) SRT format to be read. The text is as below

1
00:00:10,500 --> 00:00:13,000
Elephant's Dream

2
00:00:15,000 --> 00:00:18,000
At the left we can see

Need to basically read (1) and convert to type (2). If the file is already of type (2) take no action.

nishan says:

July 24, 2013 at 4:27 am

how to get the cause of mismatch while regex holds more than one condition to chek

Alp says:

August 13, 2013 at 3:51 am

Thank you for this page that nicely tells how to use regex, in a very understandable fashion.

Cheers 🙂

Niranjan says:

August 30, 2013 at 9:13 am

I had never grasped Regex fully well in my college days and this tutorial helped me finally do that. Shows how much it is equally important to formulate tutorials perfectly — without superfluous notations, to the point and neat!

Looking forward to other such tutorials. Thank you for this tutorial!

Amak Yunus says:

January 3, 2014 at 8:54 am

Good Job. I Like it

Gianluca Elmo says:

June 3, 2014 at 3:04 am

Very interesting and usefull tutorial.

Thanks a lot!

PS
I have been trying to register for the forum. But until today I have not received any confirmation email yet. I would appreciate if you could help me with that. Thanks.

Lincoln Baxter III says:

June 3, 2014 at 2:52 pm

Glad you like the tutorial and thanks for letting us know! Sent you an email.

[reply]

Muhadira says:

June 13, 2014 at 6:09 am

hi, i need help.
i want to take two words after ‘at’ word,
for example, string: "I was at my uncle’s house with my brother".
how can i get just "my uncle’s house"?

Venkat says:

June 25, 2014 at 2:11 am

hi,

I need to use (wild card)regular expression on a content of file.

my requriement is to find words use wild card matches:

example :
If i give input search* then

i need to search enter file and find search, searching as results

that means i need wildcard matches of search i need.

Deme says:

August 8, 2014 at 7:59 am

How can I create a formula in regex to match DE+space+a number from 1 to 99+,+SF+other number from 1 to 99+=+space and finally the alphanumeric value which I am interested and ignore everything after next space?

DE 3, SF 1 = 00 words which I want to ignore
DE 22, SF 1 = 5 words which I want to ignore
DE 22, SF 4 = 1 words which I want to ignore
DE 23, SF 1 = 0 words which I want to ignore
DE 25, SF 6 = A2 words which I want to ignore
DE 22, SF 7 = 0 words which I want to ignore

caglar sekmen says:

October 15, 2014 at 8:05 am

what does this mean ?

.split("(?<=\\W)")

Lincoln Baxter III says:

October 15, 2014 at 5:39 pm

It means split the String on which this method was called, using the provided pattern as the split point. I suggest reading the docs on `String.split()`. For more information about lookaheads and lookbehinds, read part two of this article series.

[reply]

Rajesh says:

December 30, 2014 at 7:09 am

This is very nice article, but I have a question, I want to delete all the documents within a directory except the files with extension .xml , can you please give me the regular expression for the same?

Jason Engler says:

January 9, 2015 at 10:19 am

You should use:
```
find . | grep -v xml | xargs rm -rf {}
```
Hope this helps.

[reply]

Ankit says:

January 20, 2015 at 5:43 am

String regex="^([a-zA-Z0-9,\./<>\?;’:""[\]\\{}\|`~!@#\$%\^&\*()-_=\+]*)$");

I want to make a regex for all the charaters of keyword so that later something comes up , so we can remove from that regex.

Using above code is showing error can u please provide any solution or regex for this.
using this code for matching in string
Pattern.compile(regex);
Pattern.matches(regex, inputStr);

GabLegCA says:

February 12, 2015 at 11:54 am

Best regex tutorial ever!

regexEnthu says:

September 4, 2015 at 7:14 pm

Hi,
this is the program I am working on…

String line = "/><script>alert%283869%29</script>.html";

String pattern = "/><\\.*?</.*?>/.?";

Pattern r = Pattern.compile(pattern);
java.util.regex.Matcher m = r.matcher(line);

if (m.find()) {
System.out.println("Found value: " + m.group(0) );}

I am not able to figure out the proper pattern for this line. can you please help?

Lincoln Baxter III says:

September 14, 2015 at 9:06 am

It is likely that your issue is related to matching ‘\’. In Java you are required to "double escape" backslashes. Try this:

String pattern = "/><\\\\.*?</.*?>/.?";

Note the four ‘\’ characters.

[reply]

Mansour says:

October 14, 2015 at 4:04 am

Hello
Thanks for your good information. I think a mistake is in one of the above examples!
for pattern "^I lost my:? (wallet|car|cell phone|marbles)$" you mentioned some matches strings as:
"I lost my wallet"
"I lost my wallets"
"I lost my: wallet"
"I lost my: wallets"
"I lost my car"
"I lost my car"
"I lost my: car"
"I lost my: car"
"I lost my cell phone"
"I lost my cell phone"
"I lost my: cell phone"
"I lost my: cell phone"
"I lost my marbles"
"I lost my marbles"
"I lost my: marbles"
"I lost my: marbles"

But second and fourth expression are not matching because they have an extra character ‘s’ at the end of the expression which is not allowed in the pattern 🙂

Akshay says:

July 29, 2016 at 5:45 pm

Hi, check this link http://stackoverflow.com/questions/4450045/difference-between-matches-and-find-in-java-regex

[reply]

Ankit Kumar says:

February 2, 2017 at 12:23 am

Hey there ,

Could you please tell me the regular expression for a sentence boundary (it may be (.) ,(?) or combination of both ) from a given line ?
A line may contain one or more sentences and i want to print all the sentences in next line.

Example : Line is like :-
Hey LB , how are you ?? what do u do ? ok bye .

then sentences for this line would be :
Hey LB , How are you ??
What do u do ?
ok bye.

i need a java code for this pattern says:

September 15, 2017 at 6:08 am

0
101
11011
2110112
321101123
53211011235
53211011235
321101123
2110112
11011
101
0

Como evitar que o usuário entre com um ponto "." antes de algum número? - Blogs de programación says:

November 16, 2021 at 5:28 am

[…] Se você está fazendo cálculos com double, então por padrão, números com “.” na frente são aceitos e não atrapalham o cálculo. Mas se você quiser proibir o uso do “.” antes de um número você pode usar expressões regulares. […]

2014 Tweets – MacAdie Web Blog says:

March 4, 2022 at 10:03 pm

[…] RegEx Tutorial Part 1: http://t.co/1B83fiVP59 2014-09-05 […]

Diferencia entre String replace() y replaceAll() - Fallosweb.com says:

November 21, 2022 at 12:28 pm

[…] Tutorial […]

Difference between String replace() and replaceAll() says:

December 6, 2022 at 12:21 pm

Guide to Regular Expressions in Java (Part 1)

Part 1: What are Regular Expressions?

1. Syntax

1.1. Character Classes

1.2. Quantifiers

1.3. Meta-characters

Visual Regex Tester

2. Examples

2.1. Basic Expressions

2.2. Basic Grouping

2.3. Matching/Validating

Dissecting the pattern:

2.4. Extracting/Capturing

Dissecting the pattern:

2.5. Modifying/Substitution

Dissecting the pattern:

3. Conclusion & Next Steps

72 Comments

Leave a Comment

Do you play Magic?

Get updates from OCPSoft

Shameless Advertising

Read Something New

Shameless Advertising

Search Articles