How to completely remove HTML code and convert it to text?

makinero · Post by *makinero » 2018-12-10, 15:19 UTC

How to completely remove HTML/PHP code and convert it to text. I tested the best tools and everyone damaged and did not fully transform the text. Google does not help. Because I used scripts, utilities even with UTF-8 without success.

ts4242 · Post by *ts4242 » 2018-12-10, 19:41 UTC

Use Lister> 5 HTML text (strip tags) mode
Select html file then press these keys <F3>, <5>, <Ctrl+A>, <Ctrl+C> then open your text editor and press <Ctrl+V>

Usher · Post by *Usher » 2018-12-10, 21:07 UTC

Lister may work good enough with simple layout of webpage. There is no really good way to get properly formatted text from "modern" webpages which use megabytes of javascript code and css styles.

makinero · Post by *makinero » 2018-12-11, 09:42 UTC

It is not possible to delete a simple HTML code in any way without damaging Polish letters?
I am looking for a solution for several years without success, all the most popular editors PROBLEM display correctly coding. All tutorials on various coding forums do not solve my coding problem. What's next?
I do not want to do it manually (copy / paste), because it will take ages, because I have a lot to convert HTMLtoTXT.
Each damaged character is assigned one letter. I need only a special regex that will automatically convert the damaged letters to the correct one. There used to be a great tool for repair, but I do not remember the tool anymore, and I do not remember the regex.

obeg · Post by *obeg » 2018-12-11, 10:25 UTC

Finally a problem described rather complete from first post.
Unfortunately this is nothing a filemanager is created for or used for.

Removing html from files requires some kind of editor. Of course there are many skilled people here with a lot of creative ideas and are often willing to help. But should perhaps not be discussed in a tool forum.

Usher · Post by *Usher » 2018-12-11, 10:43 UTC

@makinero
In short: TC is not the tool and this is not the forum you are looking for. Don't ask any more similar questions here, please.

Now TL;DR follows:
1. Of course you can strip all the html tags, but you can lose text formatting, even for very simple HTML without tables.
2. Of course there are tools that can do such stripping for you, just use proper keywords to search for them. Such tools are used f.e. to remove unneeded HTML from e-mails on mailing lists.
3. If you want to properly manage Polish and other international characters in text, use a text editor that supports Unicode, f.e. Notepad++. Some editors may also provide tools to strip HTML tags.
4. Single regex itself is NOT the best way to automate search-and-replace work in text when dealing with a list for search-and-replace. It's a job for script using grep or for macro created in your favourite text editor.
5. There is a good old Polish codepage converter called Gżegżółka. It can do the same job for Polish text as translit can do for Russian text.

makinero · Post by *makinero » 2018-12-11, 11:07 UTC

I only need to replace the 9 characters of small and 9 characters with the correct letters. It seems very simple, but I'm lazy and I need a regular expression to do it automatically in the future.
Example:
"Ä…"=>"ą"
"Ä‡"=>"ć"
"Ä™"=>"ę"
"Ã³"=>"ó"
"Å‚"=>"ł"
"Å„"=>"ń"
"Å›"=>"ś"
"Å¼"=>"ż"
"Åº"=>"ź"
"Å�"=>"Ł"
"Ã“"=>"Ó"

and more...

Total Commander

How to completely remove HTML code and convert it to text?

How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?

Re: How to completely remove HTML code and convert it to text?