Thursday, September 15, 2022

Chart Data Tables

Chart data table is a feature of charts, that presents in the chart area a data table with the values that are visualised by the chart. The data table is positioned automatically at the bottom of the chart, and can for certain chart types replace the X-axis labels. Until now this feature has been missing in LibreOffice, but thanks to the funding of NGI, it is now implemented.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871498




Figure 1: Charts with a data table

Chart data table usage and description

Figure 2: Insert data table dialog


The chart data table can be added to the chart from the menu (Insert -> Data Table...), where it is possible to select to show the data table and set the data table specific properties (see Figure 2). The data table specific properties are:

  • Show Horizontal Border
  • Show Vertical Border
  • Show Outline
  • Show Keys
The properties "Show Horizontal Border" and "Show Vertical Border" control if the inner horizontal or vertical borders of the data table are shown or not. The "Show Outline" controls if the outline - borders around the data table are shown or not. "Show Key" controls if the legend keys for the data series are shown in the data table in addition to the data series (row) names.

Figure 3: Data Table Dialog


In addition to those data table specific properties it is also possible to change the line, fill and font properties of the data table (see Figure 3). For example the line properties define how the borders (lines) will be shown, so it is possible to change the line style (continuous line, dashes, dots), the colour of the line, transparency, line thickness. The fill properties defines the colour of the cell background and the font properties what font and size the is used for the text in the data tables.

Implementation

The data table implementation is located in the chart2 component, like all the other chart related code. A data table is represented in the model by a new DataTable (chart2/source/inc/DataTable.hxx) class (extending the XDataTable interface) and lives on the "Diagram" object. If there is no DataTable object, it means the data table is turned off. The DataTable holds the data table specific properties as well as line, fill and font properties.

The automatic positioning of the data table is done similar like the x-axis labels, as the data table is also meant to replace the x-axis labels, unless it is not able to do so, because of the chart type (for example the "bar chart", which has the main x and y axis swapped). 

Rendering is done on the DataTableView class (chart2/source/view/inc/DataTableView.hxx), which creates a table shape and positions that into the chart. The row headers (data series names) and column headers (x-axis names) and values (from the data series) are filled into the table cell by cell, where also the cell properties are mapped from the model (DataTable class), so the data table looks correctly.

Document format support

Data table is supported by OOXML (c:dTable element). There was already present reading and writing of the basic data table properties ("horizontal border", "vertical border" and "outline" but not "keys") to preserve the document formatting even when LibreOffice couldn't render the data table itself. The properties however were not present at a convenient place (directly on the Dialog object in the model) so this had to be refactored to use a DataTable class instead. Now the OOXML support can not only preserve the complete properties (including line, fill, font) and of course also render the data table properly. 

Support for the ODF format had to be added into the LibreOffice extended namespace. This was done with a "data-table" element that was added to the "chart:chart" element. The "data-table" element only has a link to a certain style instance (linked with "style-name" attribute to a style:style element with the same "name" attribute). The data table specific properties are attributes of the "chart-properties" element ("loext:show-horizontal-border", "loext:show-vertical-border", loext:show-outline", "loext:show-keys" attributes), that can be added to a style. The style can also have "graphic-properties" and "text-properties", which are mapped to line, fill and text properties of the data table on import (and vice-versa on the export).

The support for the data tables is currently available in LibreOffice master and will be present as a feature of LibreOffice 7.5 when released. 
 

Monday, March 7, 2022

Sparklines in Calc

Sparklines are mini charts available in OOXML (XLSX) documents, but until now  were not supported by LibreOffice Calc. Thanks to the funding of NGI, this missing feature is now being implemented.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871498






To add support in LibreOffice for sparklines, we need to first read them into the LibreOffice data model, but the data model for sparklines doesn't yet exists, so we need to create that first. Sparklines are defined for one cell, but multiple sparklines can be grouped together into a group, which shares the same properties for rendering the sparkline. The unique data that is defined only for one sparkline is the data range, that a sparkline will use for rendering.

With this in mind we create a data model that consists of classes: SparklineCell -> Sparkline -> SparklineGroup, where SparklineCell is "added" to a cell and just holds a pointer to the Sparkline.

There are 3 types of sparklines supported in OOXML: line, column and stacked. The "line" type renders a line for the data range, "column" type shows each data point as a column bar, and "stacked" is a win/loss column bar, that shows if the data is positive or negative. 


Figure 1: "Line" and "Column" sparklines in LibreOffice Calc

There are many properties for a sparklines that can be customised by the user. These are some of them:
  • First Point, Last Point - if enabled, it shows the first and/or last point in different custom color (See Figure 1 - "column" sparkline in A3 cell, first is green and last is blue).
  • High Point, Low Point - if enabled, it shows the highest point and/or lowest point in different custom color (See Figure 1 - "column" sparkline at A6 cell, low points are in yellow).
  • Negative Point - if enabled, it shows the negative points in a different custom color (See Figure 1 - "column" sparkline at A6 cell, negative points are red).
  • Markes - if enabled, it shows the markers (only for "line" type) (See Figure 1 - "line" sparkline at A2 cell - shows markers).
  • Axis - if enabled, it shows the axis line
  • Right-to-left - if enabled, it shows the data in the right to left order
  • ...

Figure 2: Horizontal and vertical sparklines of all 3 types


Once we have the data model ready we can render the sparklines in the cell area. In Calc this currently looks like in Figure 1 and Figure 2. The Figure 1 shows examples of "line" and "column" sparklines (2 of each), and Figure 2 shows all three types for a block of (random) data (horizontally and vertically).

Currently the code for this is in a feature branch (feature/sparklines), but is in the process of being up-streamed to master. The feature will be available in LibreOffice 7.4.

Next step

With the current implementation, we can open a XLSX document with sparklines and they will be rendered in LibreOffice Calc, but we can't save the document and preserve the sparklines yet. It is also not possible to create new sparklines from scratch or change any of the properties yet (there is no UI for it). ODF support is also missing.

This things will be implemented in the following weeks, and when they are ready, I will blog about them again. 

Monday, September 13, 2021

Document searching and indexing export - Part 3

Milestone 3 - Gluing everything together into a search solution

In the part 1 we looked into indexing XML export, and in the part 2 into rendering a search result as an image. In this part we will glue together both parts with an indexing search engine (Solr) into a full solution for searching and introduce a "proof of concept" application for searching of documents.

Thanks to NLnet Foundation for sponsoring this work.

Solr search platform

Apache Solr is a popular platform for searching and the idea is to use it as our search and indexing engine. First we need to figure out how to put the indexing data from our indexing XML into Solr. Solr uses the concept of documents (not to be confused with a LibreOffice document), which is an entry in the database, which can contain multiple fields. To add documents into the database, we can use a specially structured Solr XML file (many others are supported, like JSON) and simply send it using a HTTP POST request.   

So we need to convert our indexing XML into Solr structure, which is done like this:
  • Each paragraph or object is a Solr document.
  • All the attributes of a paragraph or object is an field of a Solr document.
  • The paragraph text is stored in a "content" field.
  • An additional field is "filename", which is the name of the source (Writer) document.
For example:
    <paragraph index="9" node_type="writer">Lorem ipsum</paragraph>

transforms to:
    <add>
      <doc>
        <field name="filename">Lorem.odt</field>
        <field name="type">paragraph</field>
        <field name="index">9</field>
        <field name="node_type">writer</field>
        <field name="content">Lorem ipsum</field>
      </doc>
      ...
    </add>

Searching using Solr

Solr has a extensive API for querying/searching, but for our needs we just need a small subset of those. Searching is done by sending a HTTP GET to Solr server. For example with the following URL in browser:

http://localhost:8983/solr/documents/select?q=content:Lorem*

"documents" in the URL is the name of the collection (where we put our index data), "q" parameter is the query string, "content" is the field we want to search in (we put the paragraphs text in "content" field) and "Lorem*" is the expression we want to search for.


Proof of concept web application

Figure 1: Search "proof of concept" web application


The application is written in python for the server side processing and HTTP server and the client side HTML+JavaScript using AngularJS (for data binding, REST services) and Bootstrap (UI). The purpose of the web app is to demonstrate how to implement searching and rendering in other web applications.

The web app (see Figure 1) shows a list of documents in a configurable folder, where each document can be opened in Collabora Online instance. On top there is a edit filed and the "Search" button, with which we can search the documents, and a "Re-Index Documents" button, which triggers re-indexing of all the documents. 
Figure 2: Search "proof of concept" web application - Search Results

After we enter a search expression and click the "Search" button, we get a page with search results, which is a table of the document filename and the rendered image from the document, where in the document the search result has been found. See Figure 2 for an example.
There is a "Clear" button at the bottom, which clears the search results and shows the initial list of documents again.

About Server.py - REST and HTTP server

The server has the following services:
  • Provide the HTML and JS documents to the browser, so the web app can be shown
  • GET service "/document" - returns a list of documents
  • POST service "/search" - triggers a query in Solr and returns the result
  • POST service "/reindex" - triggers the re-indexing process
  • POST service "/image" - triggers rendering of an image for the input search result, and returns the image as base64 encoded string

Re-indexing service

Re-indexing glues together the "convert-to" service of the Collabora Online server, to get the indexing XML for a input document, conversion of the indexing XML to Solr supported XML and updating the entries in the Solr server.

Search service

Search service is using the Solr query REST service to search, and transforms the result to a JSON format, that we can use in the web app and is also compatible to use as an input to render a search result.

Image service

Sending a search result and the document to "render-search-result" HTTP POST service on Collabora Online server, the image of the search result is rendered and sent back. For easier use in the web client, the image is converted to base64 string. 

Demo video

Video showing searching in the WebApp:


Video showing re-indexing in the WebApp:




Proof of concept web app source location and relevant commits

The proof of concept web application is located in Collabora Online source tree inside the indexing sub-folder. Please check the README file on how to start it up.

Collabora Online:

Fixes and changes for LibreOffice core:




Tuesday, August 17, 2021

Document searching and indexing export - Part 2

Milestone 2 - Rendering an image of the search result


In the part 1, I talked about the functionality added to LibreOffice to create indexing XML file from the document, which can be used to feed into a search indexing engine. After we search, we expect a search hit will contain the added internal node information from the indexing XML file. The next step is that with the help of that information, we now render that part of the document into an image.

Thanks to NLnet Foundation for sponsoring this work.

Figure 1: Example of a rectangle for a search string

Calculating the result rectangle

To render an image, we first need to get the area of the document, where the search hit is located. This is implemented in the SearchResultLocator class, which takes SearchIndexData that contains the internal model index of the hit location (object and paragraph). The algorithm then finds the location of the paragraph in the document model, and then it determines, what the rectangle of the paragraph is.

The search hit can span over multiple paragraphs, so we need to handle multiple hit locations. With that we get multiple rectangles, which need to be combine into the final rectangle (union of all rectangles). See figure 1 for an example.

Rendering the image from the rectangle in LOKit

This part is implemented for the LOKit API, which can already handle rendering part of the document with an existing API, using rendering of the tiles.

The new function added to the API is:
bool renderSearchResult(const char* pSearchResult, unsigned char** pBitmapBuffer, int* pWidth, int* pHeight, size_t* pByteSize);

The method renders an image for the search result. The input is the pSearchResult (XML), and pBitmapBufferpWidthpHeightpByteSize are output parameters.

If the command succeeded, the function returns true, the pBitmapBuffer contains the raw image, pWidth and pHeight contain the width and height of the image in pixels, and pByteSize the byte size of the image. 

What happens internally in the function is, that the content of pSearchResult is parsed with a XML parser, so that a SearchIndexData can be created and send to SearchResultLocator to get the rectangle of the search hit area. A call to doc_paintTile then renders the part of the document enclosed by the rectangle to the input pBitmapBuffer.  

See desktop/source/lib/init.cxx - function "doc_renderSearchResult"

Collabora Online service "render-search-result"

To actually be useful, we need to provide the functionality in a form that can be "glued" together with the search provider and indexer to show the rendered image of the search hit from the document. For this we have implemented a service in the Collabora Online. The service is a just a HTTP POST request/response, where the in the request we send the document and the search result to the service, and the response is the image.

What the service does is:
  • load the document
  • run the "renderSearchResult" with the search result XML
  • interpret the bitmap and encode into the PNG format
  • return the PNG image
As an example how the service can be used, see in Collabora Online repository: test/integration-http-server.cpp - test method HTTPServerTest::testRenderSearchResult 

The following commits are implementing this milestone 2 functionality:

Core:
 

Thursday, July 1, 2021

Document searching and indexing export - Part 1

About the idea

Searching for a phrase in multiple documents is not a new thing and many implementations exist, however such searching will usually only provide you if and roughly where in the documents a searched phrase exists. With Collabora Online and LibreOffice we can do better than this and in addition provide the search result in form of a thumbnail of the search location. In this way it is easier for the user to see the context, where the searched phrase is located. For example, if it is located in a table, shape, footer/header, or is it figure text or maybe "alt" text of an image. 

Thanks to the sponsor of the work - NLnet Foundation, we are implementing this solution for Writer documents.

The solution to this consist in 3 parts:

  • preparing the data for indexing, 
  • indexing and searching 
  • rendering of the result
Preparing the data for indexing and rendering of the search result is done in LibreOffice core, while the actual indexing and searching is delegated to one of the existing indexing and searching databases / frameworks (we will provide support for Apache Solr). 

In this post I will describe what has been done for milestone 1.

Milestone 1 - preparing data for indexing

Indexing data usually consists of (enriched) text, however in our case we also need to provide additional internal information, where the text is located, so it is possible to later go to the search result location and create a thumbnail of the document. In Writer we can provide a node index of the paragraph, with which it is possible to quickly identify the text in the document model and generate a thumbnail of the area around the text.

The data for indexing is provided by a "indexing export" filter in LibreOffice, which creates a XML document with a custom structure. The root element is <indexing> and the child elements are paragraphs with index and text, which can be nested in sub-elements (like image, shape, table, section) depending on where the paragraph is located. 

For example:

 <?xml version="1.0" encoding="UTF-8"?>
<indexing>
 <paragraph index="6">Drawing : Just a Diamond</paragraph>
 <paragraph index="12"></paragraph>
 <shape name="Circle" alt="" description="">
  <paragraph index="0">This is a circle</paragraph>
  <paragraph index="1">This is a second paragraph</paragraph>
 </shape>
 <shape name="Diamond" alt="" description="">
  <paragraph index="0">This is a diamond</paragraph>
 </shape>
 <shape name="Text Frame 1" alt="" description="">
  <paragraph index="0">This is a TextBox - Para1</paragraph>
  <paragraph index="1">Para2</paragraph>
  <paragraph index="2">Para3</paragraph>
 </shape>
</indexing>

The indexing export is build upon a ModelTraverser class, which was created for the indexing purpose, but can be reused for other purposes (it is similar to what AccessibilityCheck does, but generalised, so AccessibilityCheck can in the future be refactored to use it). 

The purpose of ModelTraverser is to traverse through the Writer document model, and provide SwNode and SdrObjects to the consuming objects - in our case IndexingExport class, which extracts the text from those objects (depending on the object type) and with help of a XmlWriter, writes the indexing data to the XML file.

Indexing export filter can be tested with the LibreOffice command line "convert-to" tool in the following way:

soffice --convert-to xml:writer_indexing_export <Writer document file path>


The commits implementing this milestone 1 functionality:

In the next milestone, we will render the thumbnail with the provided search result data.


To be continued...

Thursday, May 13, 2021

Command Popup HUD for LibreOffice

Command Popup is a pop-up window that lets you search for commands that are present in the main menu and run them. This was requested in bug tdf#91874 and over-time accumulated over 14 duplicated bugs reports, so it was a very requested feature.

I'm intrigued by similar functionality in other programs, because it enables very quick access to commands (or programs) and at the same time don't need to move your hand off the keyboard. It also makes it easy to search for commands - especially in an application like LibreOffice with humongous main menu. So I decided to try to implement it for LibreOffice.

Figure 1: Command Popup window

I was working on it here and there in my free time and managed to make it work as I imagined, however it was very rough around the edges and needed a lot of polish. Luckily in April, we had a hack week at Collabora, where I decided to use some time to work on finishing the command popup. I dusted up the old code and converted it to use the weld framework for widgets and fixed the many bugs, but I didn't manage to finish it completely so it took until recently that I actually pushed the code upstream into master.

The main UX focus is to easy search and navigate with the keyboard. When the Command pop-up is focused, all keyboard events should go to the search edit box, so it is possible to change the search term, however hitting up/down should change the selection in the tree view, where the search results are shown, and enter should execute the command. To get this working correctly was quite a challenge, but I found the correct formula eventually after trying some different ideas. Of course using the mouse should still work as well. 

To show the Command Popup, there is a menu entry in "Help > Search Commands" and is by default bind to "Ctrl+F1" shortcut (however this may change). 

The Command Popup will be available in LibreOffice 7.2, but if you want to try it out, you can get the current daily build, or wait for the LibreOffice 7.2 Alpha1. Any suggestions and comments are welcome. If you find a bug, please report it in the LibreOffice bugzilla page.


Wednesday, March 24, 2021

Built-in "Xray" like UNO object inspector – Part 3

DevTools implementation has been completed and this is the third and final part of the mini-series. The focus of this part is on the object inspector, but I have also improved or changed other DevTools parts, so first I will briefly mention those.

New menu position


Figure 1: Development Tools location in the menu

The position in menu has changed from "Help / Development Tools" to "Tools / Development Tools". The new position fits better as it is near the position of macro editor and they go hand in hand with each-other.

Updates to the document model tree view



Figure 2: Document model tree view and "Current Selection" toggle button

The left-hand side document model tree view has been changed to use the same approach than the object inspector tree view uses, where the object attached tree view (in this case DocumentModelTreeEntry) has all the behavioural logic for a specific tree view node. This makes it easier to manipulate the tree view and add new nodes (if necessary).

In the document object tree view I have added a top toolbar with "Refresh" (icon) button and I changed the existing “Current Selection” button to a toolbar toggle button, so it looks more consistent as the is only the toolbar now (see figure 2).

In Writer, each paragraph tree view node now has text portion child nodes (as shown in figure 2), to make it possible to quickly inspect all the individual parts of a paragraph.

The object inspector tree view

Figure 3: Top toolbar and tab bar in object inspector


The object inspector shows various information about an object. Each object has a implementation class, which is always shown for the current object (see figure 3).

The other information that the object inspector shows are divided into four main categories:
  • "Interfaces" - the interfaces that the current object implements
  • "Services" - the services that the current object supports
  • "Properties" - the properties of the current object
  • "Methods" - the combined methods that can be called on the current object
On the user interface, the categories are divided with a tab bar (see figure 3). Each tab represents a different categories. The tabs are filled on "entry" - when the user clicks on the tab.

In the code the tabs and tree views are all handled by the ObjectInspectorTreeHandler and the hierarchy of objects attached the tree view that implement the ObjectInspectorNodeInterface (see include/sfx2/devtools/ObjectInspectorTreeHandler.hxx). 

The two major categories are "Properties" and "Methods", which I describe in more detail next.

Properties

Figure 4: Object inspector "Properties" tab


There are three types of properties of an object:
  • Properties accessible view XPropertySet.
  • Properties defined as an attribute (marked "[attribute]" in IDL). 
  • Pseudo properties defined by a get and set method. 
All the three types are represented in the properties tab and can be identified by the "Info" column. Attribute properties have an "attribute" flag, pseudo properties have either a "get", "set" or both flags. If neither flags exists, it represents a property that is from XPropertySet. 

In the tree view there are four columns for each property - "Name", "Value", "Type" and already mentioned "Info". The "Name" of the property is always available, but it is possible that two properties have the same name (because of multiple property types). The "Value" shows the value of the property, which is converted to the string, if this is possible (it should be if the property is basic), otherwise a representation string is created. The representation string for objects shows the implementation name ("<Object@SwXTextRange>"), for sequences it shows the size ("<Sequence [5]>") and for structs it just mentions the type ("<Struct>").

A node in the “Properties” tree view can be expanded (if offered) so it is possible to recursively inspect the the objects further. In case the property is a struct, it shows the struct members, and if it is a sequence, it shows the indices each object has in the sequence.

There are special properties, which names start with "@". This are added for convenience, so it is possible to inspect objects, that implement XIndexContainer, XNameContainer or XEnumeration interfaces. When such an object is found, the entries gathered using those interfaces are added to the tree view, and are prefixed with "@". For example "@3" (XIndexContainer)   "@PageStyles" (XNameContainer or XEnumeration). The type of the special property is written in the "Info" column ("index container", "name container" and "enumeration" flags). Note that this functionality is not present in Xray or the Macro editor's debugger, but was added for convenience.

Figure 5: "Properties" tab and the text view

On the bottom of the "Properties" tab there is a text view, which shows the full value of the current selected property. In the tree view, the value shown is always using a short form (shortened to 60 chars with new-line characters removed) to not make the tree view too wide. The full value therefor is written in the text view, where it can be inspected in full and has also a working copy/paste (see figure 5).

Object Stack

Related to the "Properties" is the object stack. It is possible to select an object in the tree view and inspect the object (either using the context menu or the toolbar "Inspect" action). In this case the object in the object inspector will change to the selected one. This is convenient when you are only interested in one object down in the tree view hierarchy and want to inspect only that. In that case the previous object will be added to the stack, and can be returned to with hitting the "Back" button in the toolbar.

Note that going to another object (not using "Inspect" action) will always remove the object stack. 

Methods

Figure 6: Object inspector "Methods" tab


The "Methods" tab contains a tree view that shows all the methods, that can be called for the current object. Each method is represented by four columns (see figure 6):
  • "Method" - name of the method
  • "Return type" - the return (simplified) type of the method
  • "Parameters" - list of input parameters, where each one lists the direction ("in", "out" or "in/out"), the parameter name and the simplified type 
  • "Implementation Class" - class/interface where the method is implemented
Currently the types of parameters and return types are simplified, with only basic types, "void", "any", "sequence" and "object" that represents all the objects and the type of the object isn't written. The reason for this is to make it easier to read. 

Future ideas

There are many improvements that can still be made, but aren't included in the current implementation. 

I think it would be quite convenient to have the ability to open a object inspector in mode-less dialog separate to the DevTools, just to quickly look up a property. 

Another big upgrade would also be the ability to change values of basic types for the properties and structs, so it is possible to quickly see what effect the change would have. Similar to changing property values is to call methods with defining the parameters, but only if the parameters are basic types.

My initial vision of DevTools was not that it will be only one tool (object inspector), but more tools just like the development tools in the browser, so I'm sure there will be more useful things integrated over time.

I think there are a lot of ideas you may also have, so please tell me if you have a good one. Of course if you find something that is not working as expected, please let me know.

Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible.