Thursday Aug 30


AddThis Social Bookmark Button

Extract urls using java regular expressions

PDF Print E-mail
Friday, 17 June 2011 18:20
AddThis Social Bookmark Button

Extract urls using Java regular expressions

In this sample we are using Java regular expressions to extract urls names.

Java method to extract urls

Let's define the regular expression pattern :


Pattern Description Reference

Start of a group #1

( Start of a group #3
https? look for http or https Litteral
ftp ftp protocol l Litteral
gopher gopher protocol Litteral
telnet telnet protocol Litteral
file Litteral
) End of a group #3
: Semicolon separator Litteral


Start of a group #4

Start of a group #5


Double slash Litteral


End of a group #5



Start of a group #5


Double backslash


End of a group #5


End of a group #4

one or more times


Start of a simple character class

Character class


A word character

Predefined character classes


Any digit

Predefined character classes

: Colon character Litteral

#@%/;$ ()~_?\\+-=

Number sign or at symbol or percent sign or slash or semicolo or dollar sign or a parenthesis or tilde or underscore or question mark or  plus sign or minus sign or equal sign Litteral


triple back slash


a dot or an ampersand Litteral


End of a simple character class Character class

Java regex extract multiple urls

private List<String> extractUrls(String value){
    if (value == null) throw new NullArgumentException("urls to extract");
    List<String> result = new ArrayList<String>();
   String urlPattern = "((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(value);
    while (m.find()) {
    return result;

Extracting the urls using our Pattern

If you execute our method using the following content : file://simpleFileUrl.txt file:\\\\backslashUrl.txt

Using the following sample code to execute our method :

String content = " file://simpleFileUrl.txt file:\\\\backslashUrl.txt";
List<String> result = extractUrls(content);
for (String domain : result) {
    Sstem.out.println("url :" + domain);

regex urls extraction result

url :
url :file://simpleFileUrl.txt
url :file:\\backslashUrl.txt
Tags: java , http , class , file , urls , regular , extract , character , group , litteral , start , sign


0 #1 Manikandan 2012-01-28 13:28
Excellent. The regular expression almost covers all the thing.

Add comment

Security code

Java Tutorial on Facebook