📜 ⬆️ ⬇️

Parsim URL

I want to share one useful utility written in pure JavaScript, the URL. In fact, this is a small URL parser, which works almost like window.location , but does not reload the browser page during manipulations.

And at the same time I will say a few words about getters & setters in JavaScript.

UPD1: at the request of the workers, I will bring up examples here:
// URL = 'http://my.site.com/somepath/'
var u = new URL( 'relative/path/index.html' )
u.href // my.site.com/somepath/relative/path/index.html
u.href = '/absolute/path.php?a=8#some-hash'
u.href // my.site.com/absolute/path.php?a=8#some-hash
u.hash // #some-hash
u.protocol = 'https:'
u.href // my.site.com/absolute/path.php?a=8#some-hash
u.host = 'another.site.com:8080'
u.href // another.site.com:8080/absolute/path.php?a=8#some-hash
u.port // 8080
// ,

* This source code was highlighted with Source Code Highlighter .

It works in FF3 + (maybe in 2+, I have not tried it) and in IE6 + (and this is my know-how :-)).
The article also contains a fully cross-browser implementation, but in use it is a bit more cumbersome:
// URL = 'http://my.site.com/somepath/'
var u = new URL( 'relative/path/index.html' )
u.href() // my.site.com/somepath/relative/path/index.html
u.href( '/absolute/path.php?a=8#some-hash' )
u.href() // my.site.com/absolute/path.php?a=8#some-hash
// ..

* This source code was highlighted with Source Code Highlighter .

')
Yes, and I give my listing completely, sorry, it should be so.


UPD2: briefly explain the purpose of my library:
This tulza originated precisely from practical needs.
And I have already seen several handicraft developments of a similar purpose in large JS projects, such as TinyMCE. In RTE, you often deal with links to resources. And these links need to be processed in real-time.

Specifically, I had to parse the current URL and change / add a new parameter to the search, followed by a redirect.

You can think of more.

Problem


What is the problem? The problem is that:
  1. We cannot use the window.location object, since it reloads the current page with the slightest change
  2. We cannot create another similar object through the Location constructor - atat! prohibited by browsers!
  3. The object itself is rather non-trivial in behavior.
  4. Well, I did not find any finished implementation :)

I mentioned the non-trivial behavior. Here it is:

Figure: Link URL Parts

When changing any part of the URL, others should be updated.

Parsing


As a matter of fact, I will create a semblance of window.location , so I’m window.location symbols from there. Let us examine an example:

Figure: Parse URL Parts

No comments :)

No matter how cool you can do without RegExp


The main work will be done, of course, by Regular Expression:
var pattern = "^(([^:/\\?#]+):)?(//(([^:/\\?#]*)(?::([^/\\?#]*))?))?([^\\?#]*)(\\?([^#]*))?(#(.*))?$" ;

* This source code was highlighted with Source Code Highlighter .

Now in more detail:
var pattern =
// Match #0. URL (#0 - HREF, window.location).
// , #0 == "https://example.com:8080/some/path/index.html?p=1&q=2&r=3#some-hash"
"^" +
// Match #1 & #2. SCHEME (#1 - PROTOCOL, window.location).
// , #1 == "https:", #2 == "https"
"(([^:/\\?#]+):)?" +
// Match #3-#6. AUTHORITY (#4 = HOST, #5 = HOSTNAME #6 = PORT, window.location)
// , #3 == "//example.com:8080", #4 == "example.com:8080", #5 == "example.com", #6 == "8080"
"(" +
"//(([^:/\\?#]*)(?::([^/\\?#]*))?)" +
")?" +
// Match #7. PATH (#7 = PATHNAME, window.location).
// , #7 == "/some/path/index.html"
"([^\\?#]*)" +
// Match #8 & #9. QUERY (#8 = SEARCH, window.location).
// , #8 == "?p=1&q=2&r=3", #9 == "p=1&q=2&r=3"
"(\\?([^#]*))?" +
// Match #10 & #11. FRAGMENT (#10 = HASH, window.location).
// , #10 == "#some-hash", #11 == "some-hash"
"(#(.*))?" + "$" ;


* This source code was highlighted with Source Code Highlighter .

As you might guess, this RegExp will work not only in JavaScript, but also in hundreds of other languages. Use on health! ;)

Attempt # 1


function URL(url) {
url = url || "" ;
this .parse(url);
}
URL.prototype = {
// this.href, this.parse()
href: "" ,
// - , this.update()
protocol: "" ,
host: "" ,
hostname: "" ,
port: "" ,
pathname: "" ,
search: "" ,
hash: "" ,

parse: function (url) {
url = url || this .href;
var pattern = "^(([^:/\\?#]+):)?(//(([^:/\\?#]*)(?::([^/\\?#]*))?))?([^\\?#]*)(\\?([^#]*))?(#(.*))?$" ;
var rx = new RegExp(pattern);
var parts = rx.exec(url);

this .href = parts[0] || "" ;
this .protocol = parts[1] || "" ;
this .host = parts[4] || "" ;
this .hostname = parts[5] || "" ;
this .port = parts[6] || "" ;
this .pathname = parts[7] || "/" ;
this .search = parts[8] || "" ;
this .hash = parts[10] || "" ;

this .update();
},

update: function () {
// protocol - ,
if (! this .protocol)
this .protocol = window.location.protocol;

// relative pathname/URL - relative, ""
this .pathname = this .pathname.replace(/^\s*/g, '' );
if (! this .host && this .pathname && !/^\ //.test(this.pathname)) {
// , . .
var _p = window.location.pathname.split( '/' );
_p[_p.length - 1] = this .pathname;
this .pathname = _p.join( '/' );
};

// hostname - ,
if (! this .hostname)
this .hostname = window.location.hostname;

this .host = this .hostname + (( "" + this .port) ? ":" + this .port : "" );
this .href = this .protocol + '//' + this .host + this .pathname + this .search + this .hash;
},

/**
* window.location. URL.
*/
assign: function (url) {
this .parse(url);
window.location.assign( this .href);
},

/**
* window.location. URL, history
*/
replace: function (url) {
this .parse(url);
window.location.replace( this .href);
}
}


* This source code was highlighted with Source Code Highlighter .

In details



Everything would be fine, but we oblige the user to constantly call update(...) and parse(...) after changing any part of the URL (for example, port). It's horrible. After all, the user can forget to do it, and then everything flies to Tartar.

Unfortunately, in this implementation it does not go away. But you can do everything differently :)

Attempt # 2


And now I will propose an acceptable option. We need getters & setters. The most obvious way is to create (for getProtocol() & setProtocol(newProtocol) ) getProtocol() & setProtocol(newProtocol) methods for each parameter. But I do not like this approach because of its bulkiness.

Let's do it in more javascript way. There will be one protocol(...) method and if we call it without parameters, then this is getter, and if with one parameter, then setter.

We will hide the real data in the closure.
var URL;

// . , .. parseURL updateURL.
( function () {

URL = function (url) {
// , . URL - , .
var href, protocol, host, hostname, port, pathname, search, hash;

// - , .
// Get/set href - set parseURL.call(this),
// .. parseURL URL - this.
this .href = function (val) {
if ( typeof val != "undefined" ) {
href = val;
parseURL.call( this );
}
return href;
}

// Get/set protocol
// set href, set protocol updateURL.call(this), .
this .protocol = function (val) {
if ( typeof val != "undefined" ) {
// - protocol , window.location
if (!val)
val = protocol || window.location.protocol;
protocol = val;
updateURL.call( this );
}
return protocol;
}

// Get/set host
// , host, hostname port - .
// set host.
this .host = function (val) {
if ( typeof val != "undefined" ) {
val = val || '' ;
var v = val.split( ':' );
var h = v[0], p = v[1] || '' ;
host = val;
hostname = h;
port = p;
updateURL.call( this );
}
return host;
}

// Get/set hostname
// host, hostname port.
this .hostname = function (val) {
if ( typeof val != "undefined" ) {
if (!val)
val = hostname || window.location.hostname;
hostname = val;
host = val + (( "" + port) ? ":" + port : "" );
updateURL.call( this );
}
return hostname;
}

// Get/set port
// host, hostname port.
this .port = function (val) {
if ( typeof val != "undefined" ) {
port = val;
host = hostname + (( "" + port) ? ":" + port : "" );
updateURL.call( this );
}
return port;
}

// Get/set pathname
// pathname .
// relative pathname, .. set' pathname,
// '/', .
this .pathname = function (val) {
if ( typeof val != "undefined" ) {
if (val.indexOf( "/" ) != 0) { // relative url
var _p = (pathname || window.location.pathname).split( "/" );
_p[_p.length - 1] = val;
val = _p.join( "/" );
}
pathname = val;
updateURL.call( this );
}
return pathname;
}

// Get/set search
this .search = function (val) {
if ( typeof val != "undefined" ) {
search = val;
}
return search;
}

// Get/set hash
this .hash = function (val) {
if ( typeof val != "undefined" ) {
hash = val;
}
return hash;
}

url = url || "" ;
parseURL.call( this , url);
}

URL.prototype = {
/**
* window.location. URL.
*/
assign: function (url) {
parseURL.call( this , url);
window.location.assign( this .href());
},

/**
* window.location. URL, history
*/
replace: function (url) {
parseURL.call( this , url);
window.location.replace( this .href());
}
}

// , URL .
// - URL.
// , .. .
function parseURL(url) {
if ( this ._innerUse)
return ;

url = url || this .href();
var pattern = "^(([^:/\\?#]+):)?(//(([^:/\\?#]*)(?::([^/\\?#]*))?))?([^\\?#]*)(\\?([^#]*))?(#(.*))?$" ;
var rx = new RegExp(pattern);
var parts = rx.exec(url);

// Prevent infinite recursion
this ._innerUse = true ;

this .href(parts[0] || "" );
this .protocol(parts[1] || "" );
//this.host(parts[4] || "");
this .hostname(parts[5] || "" );
this .port(parts[6] || "" );
this .pathname(parts[7] || "/" );
this .search(parts[8] || "" );
this .hash(parts[10] || "" );

delete this ._innerUse;

updateURL.call( this );
}

// , URL .
// - URL.
// , .. .
// , , setter'.
function updateURL() {
if ( this ._innerUse)
return ;

// Prevent infinite recursion
this ._innerUse = true ;

this .href( this .protocol() + '//' + this .host() + this .pathname() + this .search() + this .hash());

delete this ._innerUse;
}

})()


* This source code was highlighted with Source Code Highlighter .

In general, the code is self-documented, so I will explain only the key points:

Examples


Well, immediately to the examples. After all, the main thing - to see this thing in action.
// URL = 'http://my.site.com/somepath/'
var u = new URL( 'relative/path/index.html' )
u.href() // my.site.com/somepath/relative/path/index.html
u.href( '/absolute/path.php?a=8#some-hash' )
u.href() // my.site.com/absolute/path.php?a=8#some-hash
u.hash() // #some-hash
u.protocol( 'https:' )
u.href() // my.site.com/absolute/path.php?a=8#some-hash
u.host( 'another.site.com:8080' )
u.href() // another.site.com:8080/absolute/path.php?a=8#some-hash
u.port() // 8080
// ,

* This source code was highlighted with Source Code Highlighter .

Like this. Everything is working.
In general, this is quite a working version. Let's call it version 1.0 final.
Now let's move on to version 2.0 alpha, or tru getters and setters come into play.

Attempt number 3


I will give the code, and then I will consider the interesting moments.
var URL;

( function () {
var isIE = window.navigator.userAgent.indexOf( 'MSIE' ) != -1;

URL = function (url) {
var data = {href: '' , protocol: '' , host: '' , hostname: '' , port: '' , pathname: '' , search: '' , hash: '' };

var gs = {
getHref: function () {
return data.href;
},
setHref: function (val) {
data.href = val;
parseURL.call( this );
return data.href;
},

getProtocol: function () {
return data.protocol;
},
setProtocol: function (val) {
if (!val)
val = data.protocol || window.location.protocol; // update || init
data.protocol = val;
updateURL.call( this );
return data.protocol;
},

getHost: function () {
return data.host;
},
setHost: function (val) {
val = val || '' ;
var v = val.split( ':' );
var h = v[0], p = v[1] || '' ;
data.host = val;
data.hostname = h;
data.port = p;
updateURL.call( this );
return data.host;
},

getHostname: function () {
return data.hostname;
},
setHostname: function (val) {
if (!val)
val = data.hostname || window.location.hostname; // update || init
data.hostname = val;
data.host = val + (( "" + data.port) ? ":" + data.port : "" );
updateURL.call( this );
return data.hostname;
},

getPort: function () {
return data.port;
},
setPort: function (val) {
data.port = val;
data.host = data.hostname + (( "" + data.port) ? ":" + data.port : "" );
updateURL.call( this );
return data.port;
},

getPathname: function () {
return data.pathname;
},
setPathname: function (val) {
if (val.indexOf( "/" ) != 0) { // relative url
var _p = (data.pathname || window.location.pathname).split( "/" );
_p[_p.length - 1] = val;
val = _p.join( "/" );
}
data.pathname = val;
updateURL.call( this );
return data.pathname;
},

getSearch: function () {
return data.search;
},
setSearch: function (val) {
return data.search = val;
},

getHash: function () {
return data.hash;
},
setHash: function (val) {
return data.hash = val;
}
};

if (isIE) { // IE5.5+
var el= document .createElement( 'div' );
el.style.display= 'none' ;
document .body.appendChild(el);
el.assign = URL.prototype.assign;
el.replace = URL.prototype.replace;
var keys = [ "href" , "protocol" , "host" , "hostname" , "port" , "pathname" , "search" , "hash" ];
el.onpropertychange= function (){
var pn = event .propertyName;
var pv = event .srcElement[ event .propertyName];
if ( this ._holdOnMSIE || pn == '_holdOnMSIE' )
return pv;
this ._holdOnMSIE = true ;
for ( var i = 0, l = keys.length; i < l; i++)
el[keys[i]] = data[keys[i]];
this ._holdOnMSIE = false ;
for ( var i = 0, l = keys.length; i < l; i++) {
var key = keys[i];
if (pn == key) {
var sKey = 'set' + key.substr(0, 1).toUpperCase() + key.substr(1);
return gs[sKey].call(el, pv);
}
}
}
url = url || "" ;
parseURL.call(el, url);
return el;
} else if (URL.prototype.__defineSetter__) { // FF
var keys = [ "href" , "protocol" , "host" , "hostname" , "port" , "pathname" , "search" , "hash" ];
for ( var i = 0, l = keys.length; i < l; i++) {
( function (i) {
var key = keys[i];
var gKey = 'get' + key.substr(0, 1).toUpperCase() + key.substr(1);
var sKey = 'set' + key.substr(0, 1).toUpperCase() + key.substr(1);
URL.prototype.__defineGetter__(key, gs[gKey]);
URL.prototype.__defineSetter__(key, gs[sKey]);
})(i);
}
url = url || "" ;
parseURL.call( this , url);
}
}

URL.prototype = {
assign: function (url) {
parseURL.call( this , url);
window.location.assign( this .href);
},

replace: function (url) {
parseURL.call( this , url);
window.location.replace( this .href);
}
}

function parseURL(url) {
if ( this ._innerUse)
return ;

url = url || this .href;
var pattern = "^(([^:/\\?#]+):)?(//(([^:/\\?#]*)(?::([^/\\?#]*))?))?([^\\?#]*)(\\?([^#]*))?(#(.*))?$" ;
var rx = new RegExp(pattern);
var parts = rx.exec(url);

// Prevent infinite recursion
this ._innerUse = true ;

this .href = parts[0] || "" ;
this .protocol = parts[1] || "" ;
//this.host = parts[4] || "";
this .hostname = parts[5] || "" ;
this .port = parts[6] || "" ;
this .pathname = parts[7] || "/" ;
this .search = parts[8] || "" ;
this .hash = parts[10] || "" ;

if (!isIE)
delete this ._innerUse;
else
this ._innerUse = false ;

updateURL.call( this );
}

function updateURL() {
if ( this ._innerUse)
return ;

// Prevent infinite recursion
this ._innerUse = true ;

this .href = this .protocol + '//' + this .host + this .pathname + this .search + this .hash;

if (!isIE)
delete this ._innerUse;
else
this ._innerUse = false ;
}

})()


* This source code was highlighted with Source Code Highlighter .

Consider creating getters / setters:

Examples number 2


// URL = 'http://my.site.com/somepath/'
var u = new URL( 'relative/path/index.html' )
u.href // my.site.com/somepath/relative/path/index.html
u.href = '/absolute/path.php?a=8#some-hash'
u.href // my.site.com/absolute/path.php?a=8#some-hash
u.hash // #some-hash
u.protocol = 'https:'
u.href // my.site.com/absolute/path.php?a=8#some-hash
u.host = 'another.site.com:8080'
u.href // another.site.com:8080/absolute/path.php?a=8#some-hash
u.port // 8080
// ,

* This source code was highlighted with Source Code Highlighter .

Works in FF3 +, IE6 +. You can screw for Safari / Chrome. What about Opera - not sure. RTFM required.

Like this


I hope I did something useful and not wasted my day on writing this article :-)
PS: yes, I think to write a separate article dedicated to getters and setters in different browsers. Firefox doesn’t live by one thing (small PR: in order not to load Habrahabr with my stream of thoughts - welcome to my blog - http://web-by-kott.blogspot.com/ . There’s still something deserted, but I’m just I'm starting)

Source: https://habr.com/ru/post/65407/


All Articles