Opened 8 months ago

Last modified 7 months ago

#18912 assigned bug

HaikuDepot: Text Engine Handling of `\r` as well as `\n`

Reported by: apl-haiku Owned by: apl-haiku
Priority: normal Milestone: Unscheduled
Component: Applications/HaikuDepot Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Pointed out in [this](https://review.haiku-os.org/c/haiku/+/7723) change that the small Text Engine in HaikuDepot does not handle \r and should be able to handle sequences such as \r\n.

Change History (8)

comment:1 by pulkomandy, 8 months ago

Should that be done in the text engine?

It seems to me that this can be handled entirely at import (split the text into lines/paragraphs, no matter what the input uses to separate them) and at export (put either \n or \r\n between the lines/paragraphs).

It depends if we plan to use it only as a rich text engine or also as a generic text editor which should handle all kind of "invalid" input (mixed encodings, different type of line endings possibly mixed together in the same file, other control characters, ...).

comment:2 by apl-haiku, 7 months ago

The way that the text-engine works currently is that it models "paragraphs" within a "document" internally. The document supports replacement of arbitrary text (including newlines) within the document. This requires reconciliation of the paragraphs involved and so it is here in the document responsibility is taken for breaking the inbound text into paragraphs.

It would probably be better to re-factor the system so that it has a contiguous text storage system and changes are made into that text storage which happens to represent later as paragraphs, but as-built, the handling of the paragraph-breaking probably has to continue to happen in the document.

comment:3 by pulkomandy, 7 months ago

I think the current code works more or less as expected:

  • The \r char is ignored. If you attempt to paste some text that contains \r\n, this results in a \n, which is handled correctly internally. As a result, the text is automatically "cleaned" and converted into something that the text layout engine can handle.
  • Converting the text back into DOS/Windows line endings can be done when exporting to a text file if desirable.

I think it makes sense that the internal data model is restricted in this way, and it simplifies the internals a lot. It creates some constraints (you can't edit an arbitrary file, for example with mixed line endings, and expect it to work). But I think this constraint is acceptable. I guess it depends what we plan to do with this code. If the goal is to replace BTextView in all usages, including StyledEdit, yes, we need to handle this. Is that the goal?

comment:4 by apl-haiku, 7 months ago

If the goal is to replace BTextView in all usages, including StyledEdit, yes, we need to handle this. Is that the goal?

It would be very good to have a "TextKit" or similar that carries a general-purpose text-engine.

Applications such as StyledEdit and HaikuDepot are relatively simplistic in their text requirements so we would need to make a distinction between a text engine that covers 95% of cases like these and one that covers all cases including what you might find in an advanced desktop publishing system. If we're wanting to cover all cases then we need something more like the Paige system that we've been discussing in the forums and this code is not going to be enough for that.

If we're happy with a 95% "typical" use case coverage, which is what I would advocate for, then we could use this code as a base but I think it would be prudent to do a refactor and arrange the text storage contiguously first -- for example, the way the Cocoa Text Engine has it. This would mean changing a lot of the code and that might be a good moment to better deal with the special characters as well as the newline / paragraph handling.

comment:5 by apl-haiku, 7 months ago

It's probably worth noting that layout and editing of some non latin scripts are very complex and so if we did implement a general purpose text engine via code such as this, we would need to be aware that achieving support for a wide range of scripts may be difficult given the resources that this project has.

comment:6 by pulkomandy, 7 months ago

I think the 95% goal is not going to work well if we are trying to cover both the usage in StyledEdit (multiple encodings, preserves unknown control characters, handles mixed line endings) and the one in HaikuDepot and other similar apps (text is split into lines/paragraphs, we can assume UTF8/Unicode, no control characters allowed but extensions for images, margins, etc). In StyledEdit, we need to preserve the text byte-by-byte, even if it's invalid UTF8 in places, we want to be able to open the file, change some things, and save it again. There were bug reports with people trying to do this even on binary files with bits of text inside.

In the GSoC idea "modular edit view", there were also discussions about making the same API also handle "sourcecode" style views (monospace fonts, line numbering, syntax highlighting, etc).

To me this looks like two very different uses for a text view. I am not yet convinced that an API that tries to cover both is going to work well, it's going to be quite complex to set up for the API users and quite complex to implement for us.

So I would be fine with the existing BTextView remaining in place for things like StyledEdit, essentially a continuous plaintext representation of the text with separate "text runs" for styling. And the one in HaikuDepot being used for places where more advancing formatting is needed (paragraphs, margins, inline images), at the cost of the internal representation not being continuous arbitrary plaintext in any (mix of) format, but something more restricted: a list of paragraphs, each ended by a single \n character, and always encoded in UTF8.

comment:7 by apl-haiku, 7 months ago

Imagine a "text storage" object which contains the raw text + run-length attributes such as {12,5}bold. This is the case with StyledEdit but also run-length attributes _could_ include paragraph and paragraph styling. In this way, structure can be projected onto the raw text through the attributes. The text however would remain a contiguous string in a non-destructive manner until it is edited.

For the end developer, describing structure would be a matter of adjusting the text + run-length attributes through a replacement process followed by an attribute repair process followed by re-interpreting the changes. The internal interpretation of paragraphs, bullets etc... would be opaque to the end developer.

This is along these lines that the OPENSTEP text engine works.

I think this is probably enough to handle at least vanilla Markdown and assuming a single rectangular flow, I can imagine this covering the use-cases of StyledText and HaikuDepot for example. For code-editors and desktop-publishing applications, I would agree these are probably outside the 95%.

comment:8 by marcoapc, 7 months ago

I created a tropic on the forum about expanding text editing functions:

https://discuss.haiku-os.org/t/references-for-modular-edit-view-big/15142/2

Note: See TracTickets for help on using tickets.