Development in Expression and Spreadsheet

Here's the place for discussion related to coding in FreeCAD, C++ or Python. Design, interfaces and structures.
Forum rules
Be nice to others! Respect the FreeCAD code of conduct!
Post Reply
realthunder
Veteran
Posts: 2190
Joined: Tue Jan 03, 2017 10:55 am

Development in Expression and Spreadsheet

Post by realthunder »

In response to request from eivindkvedalen, one of the original author of the Spreadsheet workbench, I make this post for discussion of my current work on expression and spreadsheet. You may find a more user oriented overview at here. At the current stage, I consider it not matured enough to be considered for merging to master. And as you can see from the overview, there are some security concerns that need to be formally addressed first. In addition, it partially depends on my Link implementation, so it won't happen until Link is merged first.

Please build my LinkStage3 if you want to test it, as there are quite a few fixes since my last AppImage release.

Here is an overview from developer point of view. The expression syntax has been extended to mimic Python. I used a patched version of bison to generate the parser for performance reason. You can find the patch at my repo (ExpressionParser-bison-3.0.4.patch). I have replaced boost::any with a customized implementation at App/stx/any.hpp (forked from here, with some modification), also for performance reason. Specifically, boost::any does dynamic allocation for any value stored, even for bool type. Expression classes are heavily refactored. std::unique_ptr is used everywhere to eliminate most of the raw pointer usage.

I have unified all link type properties using a common parent class called PropertyLinkBase. I added several new link properties that support external link, which all have XLink in their names, implemented in PropertyLinks.cpp. In particular, there is a type called PropertyXLinkContainer, which is designed to contain multiple PropertyXLink inside. Another property, PropertyExpressionContainer is derived from PropertyXLinkContainer which offers interface for FreeCAD to batch set/get expressions. Both PropertyExpressionEngine and PropertySheet are changed to be derived from PropertyExpressionContainer, which adds a few new capabilities to these two,

* When a document contains external referencing expressions are opened, the referenced documents will be opened automatically.
* Document recomputation will auto include any external referenced object with correct ordering
* New commands are available to get/set all expressions from all objects in all documents, implemented in CommandDoc.cpp, class StdCmdExpression.
* Unified handling (like all other types of link property) of common link operations, such as auto link breaking, dependency walking, linked object label tracking, smart geometry reference based on my new topological naming, etc.

I have also just implemented user abortion during document restoring and recomputing, because of possible infinite loop in the expression.

Although I did have performance in mind when refactoring the expression, I haven't got time to do any serious benchmark or performance tuning yet. As for now, simple loop test shows that in release build, the expression evaluates about 25 times slower than Python equivalent (12 times slower if disallow user abortion). This is mostly due to excessive conversion back and forth among expression, App::any and Py::Object. I'll deal with that later.
Try Assembly3 with my custom build of FreeCAD at here.
And if you'd like to show your support, you can donate through patreon, liberapay, or paypal
ickby
Veteran
Posts: 3116
Joined: Wed Oct 05, 2011 7:36 am

Re: Development in Expression and Spreadsheet

Post by ickby »

Hello,

I like the more advanced functionality, and that it looks like python is awesome! The patched bison, is this mandatory or does it work with the default one? Because that sounds like big trouble for maintenance in my ears.

About the possibility to use python modules: do you think this is necessary? For me it feels a bit out of scope of the expression engine and too much trouble regarding the security issue. What would be your scenario for that?
realthunder
Veteran
Posts: 2190
Joined: Tue Jan 03, 2017 10:55 am

Re: Development in Expression and Spreadsheet

Post by realthunder »

ickby wrote: Fri Apr 19, 2019 4:18 pm Hello,

I like the more advanced functionality, and that it looks like python is awesome! The patched bison, is this mandatory or does it work with the default one? Because that sounds like big trouble for maintenance in my ears.
The patch is only for performance reason. It enables c++ move syntax. The thing is, I followed what upstream did, by including the generated parser and scanner source code inside the source tree, so there is no need to run bison or flex during compilation. Only when some one needs to modify the syntax rule, will he need the patched bison.

About the possibility to use python modules: do you think this is necessary? For me it feels a bit out of scope of the expression engine and too much trouble regarding the security issue. What would be your scenario for that?
Yes, it is a nuisance, which is why it is disabled by default. Every python object access is checked and blocked if it is a module. But I still make it possible for user to manually enable individual modules using FC parameter editor. If that turns out to be a security risk as well, I can remove it in release build. The intention is for advanced users to experiment with interfacing with other python modules, probably written by themselves. And they can propose to include that module with FC if it is really useful. There is a mechanism to allow access some python modules as pseudo properties. However, in order for a module to be considered secure enough, a human must be involved to review the code. For example, I personally like cadquery a lot, and want to try some advanced shape generation using expression. I have recently bundled this module in my branch as an experiment, but not before I browsed through its code and stripped out all the file access and script execution function.
Try Assembly3 with my custom build of FreeCAD at here.
And if you'd like to show your support, you can donate through patreon, liberapay, or paypal
ickby
Veteran
Posts: 3116
Joined: Wed Oct 05, 2011 7:36 am

Re: Development in Expression and Spreadsheet

Post by ickby »

realthunder wrote: Fri Apr 19, 2019 11:32 pm The patch is only for performance reason.
How much is the benefit? or the other way around, would it still be feasible to use the improvements without the patch or would it be unbearingly slow? The reason I ask is for future maintenance: there will come the point when one needs to rerun bison, to fix some bug or make a improvement or whatever, and than the patch may already be invalid due to changes in the bison code... If it is too slow with default bison I see a huge risk for the future


The intention is for advanced users to experiment with interfacing with other python modules, probably written by themselves. And they can propose to include that module with FC if it is really useful.
I do understand that this seems nice, however, I'm a bit concerned about it. This makes a FreeCAD document behavior dependent on user preferences, and hence would lead to different behavior or non-recomputeable documents for different users. We had this with the "refine geometry" option, and it was/is a huge headache (it was a very bad decision back than). An alternative is the office style macros way: "do you want to enable..." which IMHO annoys everyone. Also considering more and more server based applications, like the cadracks effort, a always valid document sheme that is secure by definition should be a priority.

So as expressions are a core feature personally I would opt for having always limited to a scope hat allows safe execution for all users. Everything else, like advanced computations, should stay within scripts.

--> Hopefully I do not sound too negative, I like the overall idea very much! With the two mentioned points sorted out (to whatever outcome the community decides) I think this benefits freecad.

p.s.: many of my security concerns come from reading this (rather long but insigthful) discussion about the failure of pysandbox. They do recommend your approach of having a own interpreter, but it seems allowing any other python modules poses a severe risk. Anyway, the ways to break the sandbox are way over my understanding of the language, hence I'm not sure I can judge anything to be safe or not...
realthunder
Veteran
Posts: 2190
Joined: Tue Jan 03, 2017 10:55 am

Re: Development in Expression and Spreadsheet

Post by realthunder »

ickby wrote: Thu Apr 25, 2019 6:45 am How much is the benefit? or the other way around, would it still be feasible to use the improvements without the patch or would it be unbearingly slow? The reason I ask is for future maintenance: there will come the point when one needs to rerun bison, to fix some bug or make a improvement or whatever, and than the patch may already be invalid due to changes in the bison code... If it is too slow with default bison I see a huge risk for the future
First of all, the patch does not touch bison's C code. It only changes its code generation template. That patch can be applied to a bison installation without recompilation, assuming the installed template doesn't change too significantly.

I didn't do any serious performance benchmark on parsing yet. The performance gain on upstream expression syntax is likely not that much. But for the extended syntax, I'm pretty sure it is worthwhile, as the official bison makes a copy of the stack value on each push, and there are so many more rules in the new syntax. Another reason is that the upstream uses raw pointer, and vectors of raw pointers, which makes it very hard to not leak. I changed it to std::unique to prevent leaking, but the downside is that it requires std::move to make it work. One option is to change it to shared_ptr. It will be a bit slower but will probably just work with official bison. I actually considered using shared_ptr for the syntax tree inside expression, but found that to be not worthwhile at the time.
So as expressions are a core feature personally I would opt for having always limited to a scope hat allows safe execution for all users. Everything else, like advanced computations, should stay within scripts.
In fact, I myself have similar concern. Apart from module import, allow calling into Python object may also be risky to some extent. I am still contemplating where to draw the line, and how to draw it.
Try Assembly3 with my custom build of FreeCAD at here.
And if you'd like to show your support, you can donate through patreon, liberapay, or paypal
User avatar
saso
Veteran
Posts: 1920
Joined: Fri May 16, 2014 1:14 pm
Contact:

Re: Development in Expression and Spreadsheet

Post by saso »

Could we use something like this and would it be useful for us to have a stronger sandbox?

https://security.googleblog.com/2019/03 ... d-api.html
https://github.com/google/sandboxed-api
Mark Szlazak
Posts: 439
Joined: Tue Apr 04, 2017 6:06 pm
Location: SF Bay Area, California

Re: Development in Expression and Spreadsheet

Post by Mark Szlazak »

PyPy sandbox is also mentioned often as effective.
realthunder
Veteran
Posts: 2190
Joined: Tue Jan 03, 2017 10:55 am

Re: Development in Expression and Spreadsheet

Post by realthunder »

saso wrote: Fri Apr 26, 2019 8:11 pm Could we use something like this and would it be useful for us to have a stronger sandbox?

https://security.googleblog.com/2019/03 ... d-api.html
https://github.com/google/sandboxed-api
Interesting, I'll take a close look to get some idea. I actually used API injection technique on some projects years ago. Unfortunately, this sandboxed-api does not support Windows yet.

Mark Szlazak wrote: Fri Apr 26, 2019 9:33 pm PyPy sandbox is also mentioned often as effective.
PyPy sandbox requires running code inside a secondary process. So unless FC uses the same Python interpret, there can't much interactivity between them. My expression extension is actually more comparable to pysandbox, but with one big difference. I interpret most of the expression code with my own interpreter, and that part should be safe, as it has no system calls, assuming it does not segfault. There is only very few entry points to go into Python side that has potential security problems, such as function calls and attribute accessing. My current counter measure is similar to pysandbox, by blacklisting various python functions and modules. A more secure approach is to use some thing like 'sandboxed-api' to block system call before calling into Python code, and then turn off the block after exiting.
Try Assembly3 with my custom build of FreeCAD at here.
And if you'd like to show your support, you can donate through patreon, liberapay, or paypal
Post Reply