How to completely remove HTML code and convert it to text?

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

How to completely remove HTML code and convert it to text?

Post by *makinero »

How to completely remove HTML/PHP code and convert it to text. I tested the best tools and everyone damaged and did not fully transform the text. Google does not help. Because I used scripts, utilities even with UTF-8 without success.
User avatar
ts4242
Power Member
Power Member
Posts: 2081
Joined: 2004-02-02, 20:08 UTC
Contact:

Re: How to completely remove HTML code and convert it to text?

Post by *ts4242 »

Use Lister> 5 HTML text (strip tags) mode
Select html file then press these keys <F3>, <5>, <Ctrl+A>, <Ctrl+C> then open your text editor and press <Ctrl+V>
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: How to completely remove HTML code and convert it to text?

Post by *Usher »

Lister may work good enough with simple layout of webpage. There is no really good way to get properly formatted text from "modern" webpages which use megabytes of javascript code and css styles.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Re: How to completely remove HTML code and convert it to text?

Post by *makinero »

It is not possible to delete a simple HTML code in any way without damaging Polish letters?
I am looking for a solution for several years without success, all the most popular editors PROBLEM display correctly coding. All tutorials on various coding forums do not solve my coding problem. What's next?
I do not want to do it manually (copy / paste), because it will take ages, because I have a lot to convert HTMLtoTXT.
Each damaged character is assigned one letter. I need only a special regex that will automatically convert the damaged letters to the correct one. There used to be a great tool for repair, but I do not remember the tool anymore, and I do not remember the regex.
User avatar
obeg
Junior Member
Junior Member
Posts: 43
Joined: 2006-09-28, 09:20 UTC
Location: Sweden

Re: How to completely remove HTML code and convert it to text?

Post by *obeg »

Finally a problem described rather complete from first post.
Unfortunately this is nothing a filemanager is created for or used for.

Removing html from files requires some kind of editor. Of course there are many skilled people here with a lot of creative ideas and are often willing to help. But should perhaps not be discussed in a tool forum.
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: How to completely remove HTML code and convert it to text?

Post by *Usher »

@makinero
In short: TC is not the tool and this is not the forum you are looking for. Don't ask any more similar questions here, please.

Now TL;DR follows:
1. Of course you can strip all the html tags, but you can lose text formatting, even for very simple HTML without tables.
2. Of course there are tools that can do such stripping for you, just use proper keywords to search for them. Such tools are used f.e. to remove unneeded HTML from e-mails on mailing lists.
3. If you want to properly manage Polish and other international characters in text, use a text editor that supports Unicode, f.e. Notepad++. Some editors may also provide tools to strip HTML tags.
4. Single regex itself is NOT the best way to automate search-and-replace work in text when dealing with a list for search-and-replace. It's a job for script using grep or for macro created in your favourite text editor.
5. There is a good old Polish codepage converter called Gżegżółka. It can do the same job for Polish text as translit can do for Russian text.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
makinero
Senior Member
Senior Member
Posts: 268
Joined: 2013-10-26, 10:05 UTC

Re: How to completely remove HTML code and convert it to text?

Post by *makinero »

I only need to replace the 9 characters of small and 9 characters with the correct letters. It seems very simple, but I'm lazy and I need a regular expression to do it automatically in the future.
Example:
"Ä…"=>"ą"
"ć"=>"ć"
"Ä™"=>"ę"
"ó"=>"ó"
"Å‚"=>"ł"
"Å„"=>"ń"
"Å›"=>"ś"
"ż"=>"ż"
"ź"=>"ź"
"Å�"=>"Ł"
"Ó"=>"Ó"

and more...
Post Reply