Coding style & idiomatic Python

matthijskooijman · Post by **matthijskooijman** » Thu May 13, 2021 3:23 pm

I've been doing a lot of digging in the code lately (mostly Draft/Arch Python code), and I've noticed some recurring patterns in the FreeCAD codebase, that I think could be improved into code that is more idiomatic python, more concise, easier to read and probably faster too here and there. I highly suspect that some of the people writing this code have limited background in Python and are simply unaware of some of the Pyhon features that could help here.

In my local checkout I've done a few small of these improvements, but I'm a bit hesitant to submit them as a pull request, since it's essentially changes that do not improve the functionality at all, do require some non-negligable effort to change and review, and I'm not so sure if such changes would be welcomed. Also, there's a thin line between objectively improving the code, and forcing my own ideas of beatiful code on others.

However, regardless of *changing* the current code in such a way is welcome, I suspect that it would be good to at least flag such issues, so people can form an opinion (maybe some of these things are intentionally simple) and maybe use my suggestions at least in new code that is written. So below, I'll describe some of these patterns that I've found. I'll try to do this using a diff showing the improvement, to hopefully make things easy to understand.

Let me know what you think, both of the examples/patterns themselves (to see whether we should apply them), as well as of the form of my suggestions (to see if I should offer more when I find them).

1. hasattr() or getattr()

In a lot of places, hasattr() is used to check whether a property exists, and if so it is queried using two nested ifs as shown below. Often, there is also an else to handle the case where the attribute does not exist, which is often identical to inner if. A lot more concise is to use the getattr() function, which retrieves an attribute and if the attribute does not exist, returns a configurable default (the second argument). Consider below:

Code: Select all

--- a/src/Mod/Draft/draftobjects/wire.py
+++ b/src/Mod/Draft/draftobjects/wire.py
@@ -102,10 +102,7 @@ def execute(self, obj):
             if obj.Base.isDerivedFrom("Sketcher::SketchObject"):
                 shape = obj.Base.Shape.copy()
                 if obj.Base.Shape.isClosed():
-                    if hasattr(obj,"MakeFace"):
-                        if obj.MakeFace:
-                            shape = Part.Face(shape)
-                    else:
+                    if getattr(obj,"MakeFace",True):
                         shape = Part.Face(shape)
                 obj.Shape = shape
         elif obj.Base and obj.Tool:

Note that after this change, the part.Face(shape) call happens only once, and it is now easier to see that the default value of MakeFace is True. Also, for cases where there was no else clause (i.e. when the attribute is not present, do nothing), the default value could be made False.

The default value can of course be any value, so sometimes it can help to use it with an empty list, e.g.:

Code: Select all

--- a/src/Mod/Arch/ArchComponent.py
+++ b/src/Mod/Arch/ArchComponent.py
@@ -1426,14 +1426,12 @@ def claimChildren(self):
                     c = []
                 else:
                     c = [self.Object.Base]
-            if hasattr(self.Object,"Additions"):
-                c.extend(self.Object.Additions)
+            c.extend(getattr(self.Object, "Additions", [])
-            if hasattr(self.Object,"Subtractions"):
-                for s in self.Object.Subtractions:
+            for s in getattr(self.Object, "Subtractions", []):
                     if Draft.getType(self.Object) == "Wall":
                         if Draft.getType(s) == "Roof":
                             continue

For clarity, I've omitted whitespace-only changes.

Note that the above makes two changes, one which takes advantage of the fact that extend([]) is a no-op, the second of the fact that looping over an empty list is a no-op.

In addition to make the code shorter, these changes also help to reduce the nesting level, which often also makes the code easier to read.

2. Nested if or combined with and

In a lot of places, there's a lot of nested ifs. When these ifs contain nothing else than another if, they're essentially just a single if with a logical and of their conditions, and I would usually write them as such. For example:

Code: Select all

--- a/src/Mod/Arch/ArchComponent.py
+++ b/src/Mod/Arch/ArchComponent.py
@@ -737,9 +737,7 @@ def processSubShapes(self,obj,base,placement=None):
                         base = base.fuse(add)
 
                     elif hasattr(o,'Shape'):
-                        if o.Shape:
-                            if not o.Shape.isNull():
-                                if o.Shape.Solids:
+                        if o.Shape and not o.Shape.isNull() and o.Shape.Solids:
                                     s = o.Shape.copy()
                                     if placement:
                                         s.Placement = s.Placement.multiply(placement)

For clarity, I've omitted whitespace-only changes again.

Again, this reduces nesting level, making the code easier to read. Also the fact that the and operator is used makes it IMHO more clear what the code does. There is a tradeoff here: Nesting is reduced, but line length is increased. For even longer expressions, you can use parenthesis around the condition to make it span multiple lines, or introduce variables for parts of the expression. Note that that increases the line count again, but not the nesting level.

3. Duplicate code in if branches

Here's another pattern, which is essentially similar to the 1. above, where multiple if branches have the same code. This makes it harder to quickly see that the same thing happens in two situations, and also makes the code prone to breakage when one place is changed and the other is not. Consider:

Code: Select all

--- a/src/Mod/Arch/ArchComponent.py
+++ b/src/Mod/Arch/ArchComponent.py
@@ -1420,9 +1418,7 @@ def claimChildren(self):
         if hasattr(self,"Object"):
             c = []
             if hasattr(self.Object,"Base"):
-                if Draft.getType(self.Object) != "Wall":
-                    c = [self.Object.Base]
-                elif Draft.getType(self.Object.Base) == "Space":
+                if Draft.getType(self.Object) == "Wall" and Draft.getType(self.Object.Base) == "Space":
                     c = []
                 else:
                     c = [self.Object.Base]

It might take a bit of effort to see that both versions are in fact equivalent, but IMHO the new version makes it more clear that a Wall based on a Space is the special case here, and all others are treated equal. Note that because of the c = [] a few lines earlier, this can be made even shorter:

Code: Select all

             c = []
             if hasattr(self.Object,"Base"):
-                if Draft.getType(self.Object) != "Wall":
-                    c = [self.Object.Base]
-                elif Draft.getType(self.Object.Base) == "Space":
-                    c = []
-                else:
+                if not (Draft.getType(self.Object) == "Wall" and Draft.getType(self.Object.Base) == "Space"):
                     c = [self.Object.Base]

And then improvement 2 can also be applied, though I'm not entirely sure if that would make things better or not.

4. Using set() to prevent duplicates

In some places, a list must be produced without duplicates, which then often produces a list, and then produces a second list checking for duplicates. This takes a lot of code, and is not quite efficient, especially for longer lists. In Python the `set` type can be used instead, which is a container, like list, but cannot contain duplicate items. Removing duplicates from a list by simply converting it to a set is a common idiom in Python:

Code: Select all

--- a/src/Mod/Draft/draftutils/groups.py
+++ b/src/Mod/Draft/draftutils/groups.py
@@ -245,13 +245,7 @@ def get_group_contents(objectslist,
                 if walls:
                     newlist.extend(get_windows(obj))
 
-    # Clean possible duplicates
-    cleanlist = []
-    for obj in newlist:
-        if obj not in cleanlist:
-            cleanlist.append(obj)
-
-    return cleanlist
+    return set(newlist)

One caveat is that a set is unordered (it uses a hashtable internally, which is for a large part why it is faster). If the order of items is important (it might be in the above example, I'm not entirely sure), an alternative is to use the OrderedDict (with dummy values and the items in the keys), or a maybe a custom datatype that uses set for uniqueness and list for ordering (there's a few available on pypy).

Also, the above diff does change the return type from list to set. This is often a good idea, since the fact that it produces unique values is part of the interface (especially if that also means they're unordered), but if a list return type is required, you can simply use the list constructor to convert the set after producing it.

Finally, the above first builds a list and then uses a set to remove duplicates, but it would of course be even cleaner to just produce a set in the first place (but that would be a bigger diff, so I didn't for this example).

5. Using generators and yield instead of building lists

There is a lot of code that produces a list of things, which is then iterated over by the caller once. For these cases, Python supports generators / generator functions. These are function that do not produce a list, but a generator that can produce elements one by one. When iterating over such a generator, there is no need to first calculate the complete list and return that, but instead the for loop in the caller just starts executing and whenever it needs the next value, the generator runs for a bit to produce the next value. This saves a lot of memory and allocation, and makes code simpler because there is less boilerplate for initializing a list and returning it. Generators are also automatically nested: If you call a generator function that internally calls another generator function, iterates over its result, and yields a modified version of each element, then whenever the caller needs a new value from the outer generator function, that will ask the inner generator for the next value, modify it and then yield the result, still no lists need to be allocated.

For example:

Code: Select all

--- a/src/Mod/Draft/draftutils/groups.py
+++ b/src/Mod/Draft/draftutils/groups.py
@@ -66,16 +66,12 @@ def get_group_names(doc=None):
 
     if not found:
         _err(_tr("No active document. Aborting."))
-        return []
+        return
 

-    glist = []
-
     for obj in doc.Objects:
         if (obj.isDerivedFrom("App::DocumentObjectGroup")
                 or utils.get_type(obj) in ("Floor", "Building", "Site")):
-            glist.append(obj.Name)
+            yield obj.Name
-
-    return glist

Here, the yield operator returns the next element and suspends execution of the function. When the next element is needed, execution resumes after the yield, again and again until the end of the function is reached (or a return is encountered).

One caveat is that a generator is not a list, so it means that it cannot be indexed and can be iterated only once. In most cases, this is exactly what happens, so this should be enough (though there is the issue of backward compatibility with third party code that might prevent converting things to generators immediately). If you have a generator but need indexing or multiple iteration, then you can always just use the list() constructor to convert a generator to a list.

Post by **chennes** » Thu May 13, 2021 5:18 pm

matthijskooijman wrote: ↑Thu May 13, 2021 3:23 pm I'm a bit hesitant to submit them as a pull request, since it's essentially changes that do not improve the functionality at all, do require some non-negligable effort to change and review, and I'm not so sure if such changes would be welcomed.

I will say that in my experience, the devs here are pretty friendly people, and as long as you discuss things here on the forums first you are unlikely to offend anyone with a PR of this sort

. But keep the PRs focused! Personally, I can review 100 line PRs all day long. But those 3k line PRs are beastly!

FWIW, in my opinion, changes that reduce the number of lines of code are almost always worth the review time: in the long run, we all spend SO much of our time just reading code! Making things shorter, less complex, and more idiomatic is well worth the effort of the code review. If someone is willing to take the time to make the changes, I for one am happy to review them.

carlopav · Post by **carlopav** » Thu May 13, 2021 7:55 pm

I agree with @chennes! And i'm always for improvements of the readability of the code.
Thanks also for the detailed post, i learned more than a couple of things on python (also if point 4 and 5 are still unclear to me

, but i'll give them a try soon i hope).

Post by **bernd** » Thu May 13, 2021 9:20 pm

your changes makes sense to me too.

Go for a PR. May be do not change all Draft and Arch. Start with just a few and see what Yoriks opinion is. He is the one who merges all Arch and Draft. If he is with your, you can change them all.

Post by **Kunda1** » Sat May 15, 2021 6:09 am

Super informative/educational post and very invested! I say go for the PR using the feedbacm given so far by others in the thread.

matthijskooijman · Post by **matthijskooijman** » Wed May 19, 2021 9:26 pm

Thanks for all your encouraging comments, let's see if I can find some time to fix some of these in a PR, then.

chennes wrote: ↑Thu May 13, 2021 5:18 pm I will say that in my experience, the devs here are pretty friendly people, and as long as you discuss things here on the forums first you are unlikely to offend anyone with a PR of this sort :). But keep the PRs focused! Personally, I can review 100 line PRs all day long.

I was not so much worried about offending people, more about overloading them with more review work, given the current stack of PRs is already quite high and cleanup PRs like these tend to make a lot of mechanical changes in a lot of places, which also tend to cause conflicts with other PRs, so they tend to annoy other contributors if they are merged quickly, and produce a lot of work to keep them up to date if a lot of other things are merged first. But let's start small and we'll see :-)

chennes wrote: ↑Thu May 13, 2021 5:18 pm FWIW, in my opinion, changes that reduce the number of lines of code are almost always worth the review time: in the long run, we all spend SO much of our time just reading code! Making things shorter, less complex, and more idiomatic is well worth the effort of the code review. If someone is willing to take the time to make the changes, I for one am happy to review them.

I very much agree there.

Coding style & idiomatic Python

Coding style & idiomatic Python

Re: Coding style & idiomatic Python

Re: Coding style & idiomatic Python

Re: Coding style & idiomatic Python

Re: Coding style & idiomatic Python

Re: Coding style & idiomatic Python