Blog

What you should know about floating-point

Date: November 29, 2018

Wolfgang Meincke

Stuttgart, Germany

Floating-point code becomes increasingly popular
Despite all advantages, floating-point mathematics also brings new challenges compared to fixed-point
This article shows some interesting facts that you might not have known

Floating-point code becomes more and more common in today’s automotive industry. Since development is often done model-based using tools such as Simulink and dSPACE TargetLink, the model serves as an additional abstraction layer and therefore is an additional stage in the development and testing process. In comparison to fixed-point, floating-point comes up with several advantages. The biggest advantage, from most developers’ point of view, is to get rid of the scaling task. There is also a better detection of infinite and NaN (not a number), represented by special values. Beyond that, it seems that the precision increases and almost enables to represent the physical behavior (let’s see if that’s true). Finally, the range of representable values is much larger (which also comes with a downside).

So, looking at all the advantages, why not switch everything to floating-point?

Consider the following characteristics, before you decide to use floating-point:

Accuracy

Let’s say there is an unsigned fixed-point integer with a scaling (=resolution) of 0.01. Based on this assumption, the calculation 0.01 + 0.01 will naturally result in exactly 0.02. Assigning the same calculation using floating-point (single precision), 0.01 + 0.01 will result in 0.0199999995529651641845703125. It’s close to 0.02 but not the exact value. If we have a look at the value 0.01, we can see that it is neither represented in an exact way. This becomes clear if we look how the value 0.01 is really represented in floating-point.

Structure of a floating-point variable

A floating-point value is made up of 3 elements; 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa totaling 32 bits. To represent a certain number, the mantissa represents a certain value with the exponent. In the case of 0.01, it looks like this:

If we calculate 1.2799999713897705 * 2^-7 it results in 0.00999999977648258209228515625, the closest value in float to 0.01. This relative error due to rounding, is called the machine epsilon.

The first conclusion is, if you need an exact representation of values for things like loop counters or checking values for equality, floating-point code is not the right choice.

Whereas half of the representable values of a float variable range between -1 and 1, accuracy decreases for larger values. With a float (single precision), the number 16.777.217 cannot be represented and depending on the rounding mode of the compiler, instead, this will lead to 16.777.216 or 16.777.218. For larger numbers, the gap between each representable value is even larger.

To summarize, floating-point variables do not bring more precision compared to fixed point. However, they bring a precision which dynamically changes across the value range; small numbers have high precision, large numbers have less precision. If your variables stay inside a clearly defined value range, similar (or even better) precision is possible with a constant scaling in fixed-point.

Comparison

Being aware of the discoveries in the previous section, there is also an effect on comparisons. There are different situations, where comparisons are done.

Compare a variable and a constant value
Loop counter
Switch-Case operations

In these use cases, an exact comparison is needed, otherwise, this might lead to unintended behavior in the floating-point code. In other use cases like greater/less comparisons, the floating-point precision can be suitable. However, if an exact and transparent behavior is intended, the inaccuracy might lead to a smaller or greater value than expected as a result of a calculation and therefore the wrong decision is made. For these use cases, the usage of fixed-point is the first choice.

Influence of the compiler for floating-point code

Using floating-point code, the compiler has a much higher impact on the behavior of the code compared to fixed-point. This becomes more important for model-based development, where usually three stages of implementation are taken into account. These are model-in-the-loop (MIL), software-in-the-loop (SIL) and processor-in-the-loop (PIL, target object code). MIL usually represents the values with double precision. Most of the target processors in embedded software are based on 32-bit, so the SIL implementation is typically done in single precision. Based on the data type used, this might lead to different behavior in certain situations on MIL and SIL/PIL. In addition, there are three main influence factors that you should be aware of:

Precision

Let’s assume the following code:

double v = 1 x 10³⁰⁸;
double x = (v * v) / v;

The expected result of the second line of code is +∞. Even though this might be correct, sometimes the result can be x = v, if the processor is able to internally calculate with 80-bit precision. This can have an effect on development and testing on different implementation levels.

Optimization

Algebraic laws do not hold in general in finite precision arithmetic like

Although many compilers give the opportunity to control compiler optimizations, such value-changing compiler optimizations can be enabled by default to improve efficiency and/or reduce size like for the Intel® Compiler.

Rounding

If the number of available digits is not sufficient to represent a real number precisely, rounding is used to represent a close value

double a = 0.1;
double x = (a * -a) * (a * a);

The result for x in the second line of code depends on the used rounding mode:

This means that depending on the compiler settings and the capabilities of the underlying CPU, the results of calculations might be different between different compilers and CPUs. This can even lead to slightly different code coverage results between SIL and PIL floating-point code.

Conclusion

To answer the question of why not switching everything to floating-point code, there is a wide range of use cases, that will improve and get easier by using floating-point variables. However, there are certain use cases where using floating-point might lead to unintended and untransparent behavior. Furthermore, on different implementation levels like MIL, SIL and PIL the results of floating-point arithmetic might be different due to different compiler settings for precision, optimization and rounding. This makes testing of the target code (PIL) much more important than in fixed-point arithmetic. To deal with these new challenges and to stay ISO 26262 compliant, the test process should consider a structural Back-to-Back test besides the Requirements-based Testing. This should be done between the reference implementation (usually MIL) and the target code (PIL), if available or SIL otherwise.

If you want to dive deeper in the topic of floating-point, I can highly recommend the paper The pitfalls of verifying floating-point computations from David Monniaux.

Please, also have a look at the article of my colleague Markus Gros about What you should know about fixed-point code, so to speak the counterpart to floating-point code.

Wolfgang Meincke

Stuttgart, Germany

Senior Application Engineer
Senior Business Development Manager India

Wolfgang Meincke studied Computer Science at the University Ravensburg-Weingarten where he graduated in 2006. He then worked at EWE TEL GmbH where he was responsible for requirements engineering and project management for software development projects as well as agile software development processes. In 2014 he joined BTC as senior pilot engineer to support the customers to integrate the tools into their development processes as well as giving support and training on test methodologies regarding the ISO 26262 standard. One of his main areas of interest is the formalization of safety requirements and their verification based on formal testing and model-checking technologies for unit test, integration test and HIL real-time-testing.

Connect on LinkedIn

All Videos

Request Evaluation License

If you would like to try out our tools, we will gladly provide an evaluation license free of charge. Evaluations include a free launch workshop and also provide an opportunity for you to meet one-on-one with our support and engineering teams.

Schedule a Meeting

Do you have any questions or want to see our tools in action? If so, please use the link below to schedule a meeting, where a member of our engineering team will be happy to show you the features and use cases and directly answer any questions you might have.

Melden Sie sich für unseren Newsletter an!

Für den Newsletterversand wird Ihre Email Adresse vom DSGVO zertifizierten Dienstleister CleverReach verarbeitet. Weitere Informationen finden Sie in unserer Datenschutzerklärung.

Videos

Discover some of the main features of our products in these short videos.

Test Lösungen für Simulink Modelle und Seriencode

Use Cases

Testumgebungen

Produkte

Blog

What you should know about floating-point

Accuracy

Structure of a floating-point variable

Comparison

Influence of the compiler for floating-point code

Precision

Optimization

Rounding

Conclusion

Popular Videos

Request Evaluation License

Schedule a Meeting

Melden Sie sich für unseren Newsletter an!

Videos

Produkte

Unternehmen

News

Wissen

Kontakt

Produkte

Unternehmen

News

Wissen

Kontakt

Test Lösungen für Simulink Modelle und Seriencode

Use Cases

Testumgebungen

Produkte

Blog

What you should know about floating-point

Accuracy

Structure of a floating-point variable​

Comparison

Influence of the compiler for floating-point code

Precision

Optimization

Rounding

Conclusion

Popular Videos

Request Evaluation License

Schedule a Meeting

Melden Sie sich für unseren Newsletter an!

Videos

Produkte

Unternehmen

News

Wissen

Kontakt

Produkte

Unternehmen

News

Wissen

Kontakt

Structure of a floating-point variable