Extract URLs from text and HTML page

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Extract URLs from text and HTML page

Post by *Alexisback »

Hi everyone,
:)
question:
Is it possible to extract url from a text and an html page with Total Commander or other tools? :roll: :idea:
Thanks
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48075
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Extract URLs from text and HTML page

Post by *ghisler(Author) »

You can view the html file with F3, and then copy the URL (or all of them together) via right click.
Author of Total Commander
https://www.ghisler.com
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

I did not explain well
in a web page or text that contains code and more I want to extract only the url
starting with the prefix http or https or ftp
everything else does not interest me
User avatar
Stefan2
Power Member
Power Member
Posts: 4153
Joined: 2007-09-13, 22:20 UTC
Location: Europa

Re: Extract URLs from text and HTML page

Post by *Stefan2 »

Alexisback wrote: 2018-10-28, 16:31 UTC Is it possible to extract url from a text and an html page with Total Commander or other tools?
Yes, there are tools for that.


Here with TC for example this > "[WCX] RegXtract - String Extractor with RegEx - RegXtract packer plug-in"
viewtopic.php?f=6&t=38638
2milo1012



I, OTOH, would use a text editor or a script. Or both with e.g. EmEditor or PSPad. There a many example about that at that g00gle pages.


 
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

Stefan2 wrote: 2018-10-30, 13:09 UTC
Alexisback wrote: 2018-10-28, 16:31 UTC Is it possible to extract url from a text and an html page with Total Commander or other tools?
Yes, there are tools for that.


Here with TC for example this > "[WCX] RegXtract - String Extractor with RegEx - RegXtract packer plug-in"
viewtopic.php?f=6&t=38638
2milo1012



I, OTOH, would use a text editor or a script. Or both with e.g. EmEditor or PSPad. There a many example about that at that g00gle pages.


 
Thanks :D

I try and see what I can do
even if I do not know the "regular expressions" :cry:
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

I use Notepad ++
it is possible to integrate it? :roll:
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

This work :D

Thanks :wink:

In Notepad++, in the Replace menu (CTRL+H) you can do the following:

Find:

Code: Select all

.*?(http\:\/\/www\.[a-zA-Z0-9\.\/\-]+)
Replace:

Code: Select all

$1\n
Options: check the Regular expression and the . matches newline

This will return you with a list of all your links. There are two issues though:

The regex you provided for matching URLs is far from being generic enough to match any URL. If it is working in your case, that's fine, else check this question.
It will leave the text after the last matched URL intact. You have to delete it manually.

https://stackoverflow.com/questions/19717092/regex-filter-links-from-a-document
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

the problem in Notepad ++ and that regular expressions can not be saved
it would take a tool with a database to save the snipptes
does something like this exist? :roll:
User avatar
ts4242
Power Member
Power Member
Posts: 2081
Joined: 2004-02-02, 20:08 UTC
Contact:

Re: Extract URLs from text and HTML page

Post by *ts4242 »

Alexisback wrote: 2018-10-30, 12:50 UTCin a web page or text that contains code and more I want to extract only the url
starting with the prefix http or https or ftp
everything else does not interest me
Just as ghisler(Author) answered you, Lister can do, here are details steps:

1- Put the cursor on the web page file.
2- Press <F3> to open the file with Lister
3- from Options menu select 5 HTML text (strip tags) (usually TC auto detect the file contents and pre-select that option)
4- Right click on a link or white space, select Copy URL or Copy all URLs
5- Go to your text editor and paste
Alexisback
Junior Member
Junior Member
Posts: 80
Joined: 2016-10-26, 20:04 UTC

Re: Extract URLs from text and HTML page

Post by *Alexisback »

ts4242 wrote: 2018-10-30, 20:06 UTC
Alexisback wrote: 2018-10-30, 12:50 UTCin a web page or text that contains code and more I want to extract only the url
starting with the prefix http or https or ftp
everything else does not interest me
Just as ghisler(Author) answered you, Lister can do, here are details steps:

1- Put the cursor on the web page file.
2- Press <F3> to open the file with Lister
3- from Options menu select 5 HTML text (strip tags) (usually TC auto detect the file contents and pre-select that option)
4- Right click on a link or white space, select Copy URL or Copy all URLs
5- Go to your text editor and paste
thanks you :D
Post Reply