Can anyone see any flaws in it for real-world URL?

>>> str = 'and now http://sub.domain.com/something/?here3=3ab&what=1#where=1 that was a URL'

>>> urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&#+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str)

>>> urls

['http://sub.domain.com/something/?here3=3ab&what=1#where=1']

For me it looks like working but you never now…  Comments from @HD42 would be highly appreciated =)

Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">