Extract urls using java regular expressions
Last Updated on Monday, 20 June 2011 13:55
|
Extract urls using Java regular expressions
In this sample we are using Java regular expressions to extract urls names.
Java method to extract urls
Let's define the regular expression pattern :
((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)
| Pattern | Description | Reference | ||
|---|---|---|---|---|
| ( |
Start of a group #1 |
|||
| ( | Start of a group #3 | |||
| https? | look for http or https | Litteral | ||
| | | ||||
| ftp | ftp protocol | l Litteral | ||
| | | ||||
| gopher | gopher protocol | Litteral | ||
| | | ||||
| telnet | telnet protocol | Litteral | ||
| | | ||||
| file | Litteral | |||
| ) | End of a group #3 | |||
| : | Semicolon separator | Litteral | ||
|
( |
Start of a group #4 | |||
| ( |
Start of a group #5 |
|||
|
// |
Double slash | Litteral | ||
|
) |
End of a group #5 |
|||
|
| |
||||
|
( |
Start of a group #5 |
|||
|
\\\\ |
Double backslash | |||
|
) |
End of a group #5 |
|||
|
)+ |
End of a group #4 |
|||
|
[ |
Start of a simple character class |
|||
|
\\w |
Predefined character classes | |||
|
\\d |
Any digit |
Predefined character classes |
||
| : | Colon character | Litteral | ||
|
#@%/;$()~_?\\+-= |
Number sign or at symbol or percent sign or slash or semicolo or dollar sign or a parenthesis or tilde or underscore or question mark or plus sign or minus sign or equal sign | Litteral | ||
|
\\\\\\ |
triple back slash | |||
|
.& |
a dot or an ampersand | Litteral | ||
|
]* |
End of a simple character class | Character class | ||
| ) | ||||

Extracting the urls using our Pattern
If you execute our method using the following content :
http://www.ubiteck.com/test/mypage.jsf?param1=ok file://simpleFileUrl.txt file:\\\\backslashUrl.txt
Using the following sample code to execute our method :
![]()
url :http://www.ubiteck.com/test/mypage.jsf?param1=okTags: java, http, class, file, urls, regular, extract, character, group, litteral, start, sign
url :file://simpleFileUrl.txt
url :file:\\backslashUrl.txt





Comments
RSS feed for comments to this post