English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Regular Expression for Matching URL Websites in ASP.NET

%<I am doing an asp.net project, the content is to prevent some URLs in a text box>%

Firstly, the regular expression:

String check = @"((http|ftp|https)://)(([a-zA-Z0-9\._-]+\.[a-zA-Z]{2,6})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3))(:[0-9]{1,4)*(/[a-zA-Z0-9\&%_\./-~-]*)?";

Description of this regular expression:

①: The string matched by this regular expression must start with http://, https://, ftp://start with;
②: This regular expression can match URLs or IP addresses; (such as: http://www.baidu.com or http://192.168.1.1)
③: This regular expression can match the end of the URL, i.e., it can match sub-URLs; (such as if it can match: http://www.baidu.com/s?wd=a&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg&inputT=1236)
④: This regular expression can match the port number;

Block some specified URLs:

If we want to block http: in the input text box//The URL www.baidu.com, the traditional method is to match the URL in the text box with the above regular expression, read out all the URLs and then compare them with the URL to be blocked. However, this method has a drawback, that is, the URL we read out is up to the sub-URL, while we may just write a parent URL in the configuration file. Therefore, we need to cut the URL checked out and add the default port number of the website is:80, we need to compare port numbers, etc., I think of a new method:

Read out the URL to be blocked from the configuration file, form a regular expression to match the text box, if it can be matched, then block it.

In the configuration file, it should be written as: <add key="DomainCheckBlackUrl" value="baidu.com" />

Implementation in code:

Now a regular expression is composed of3Part composition:

1: The beginning of the regular expression, which may consist of any characters
2: The middle part of the regular expression: the part read out from the configuration file
3: The end part of the regular expression: may include some subdirectories or port numbers, etc.

Firstly, read out the URL from the configuration file: string[] serverlist = ConfigurationManager.AppSettings["DomainCheckBlackUrl"].Split(','); (split by comma in the configuration file)
Secondly, string start = @"((http|ftp|https)://)([a-zA-Z0-9_-]+\.)*"; (the beginning of the regular expression)
Then, the end of the regular expression: end = @"(:[0-9]{1,4})?(((/[a-zA-Z0-9\&%_\./-~-]*)|(?=[^a-zA-Z0-9\.]))";
The combined regular expression: string check = start + @"((?<=[^a-zA-Z0-9])(" + CutStr + "))" + end;

These are some of my personal insights, I hope they can be helpful to everyone.

You May Also Like