I investigated some more. It seems that 4-byte unicode is not correctly handled by certain applications although perfectly valid according to the standard. Some Unicode encodings are variable length and then you would use the shortest length necessary, i.e. not pad with leading zeros. But ISO10303-21 does not explicitely mention such requirement (only for 1-byte unicode), thus 4-byte throughout should be perfectly valid. I saw that Thomas made a fix to use 2-byte encoding where possible
in IfcOpenShell. Should make it into FreeCAD once there is a reasonably stable IfcOpenShell release. The codepoints above 0xFFFF are only for symbols of rare and ancient languages as well as additional graphical symbols. They will rarely occur in real world scenarios, unless someone desperately wants to put 0x1D11E for 𝄞 in their company name or something like that.