Monday, September 15, 2025

Asynchronous Coding in Java Using CompletableFuture

A recent hobby project to code an Android app to monitor cellular and wifi connectivity glitches involved a significant amount of coding using asynchronous callback patterns. The Android operating system provided its own challenges that required some experimentation and research to resolve. On the other hand, the basic mechanics of offloading work within a Java application to separate background threads triggered a significant portion of the work.

The CompletableFuture class in Java doesn't turn out to be extremely difficult to use but, like many other techniques in the Java realm, the quality and clarity of examples available online can be confusing if not contradictory. What follows is an attempt at an introduction that hits the optimal point between being concise enough to not overwhelm while being complete enough to identify pitfalls and subtleties that can lead to hours of puzzlement from only partially understanding how the class works.


The Problem with Synchronous Calls

Modern computers and handheld devices such as smartphones and tablets have tremendous processing power in their CPUs but are expected to support DOZENS of applications running simultaneously. Even with mutli-core CPUs, not EVERYTHING can literally run simultaneously so, instead, the operating systems for these devices are designed to suspend and resume processing of different applications very efficiently. This context switching is normally performed so quickly that human users are left with the ILLUSION that all of their apps ARE in fact running simultaneously with each application enjoying the full attention of the operating system and hardware.

This illusion only works if all the running applications obey minimum expectations of the operating system and return control TO the operating system on a VERY consistent schedule. This is accomplished by collecting user input (clicks, keypresses, touches) via interrupt-driven mechanisms that are monitored by the application's main event loop. If no new inputs are detected, the main loop returns execution control to the OS who allows every other running application time for the same main event loop checks.

This illusion breaks if one of the running applications gets its time slice, sees pending inputs it needs to process, then fires off a call to handle one of those inputs that takes an inordinate amount of time to complete. During that wait period, the app isn't doing anything else and the execute thread cannot be used by the operating system for any other application and the overall device can appear to lock up or hang / crash.

As a simple example of the problem, imagine an application that deals with three types of business objects modeled as BusinessObjectX, BusinessObjectY and Location as shown here.

    static class BusinessObjectX {
        public int id;
        public String name;

        public BusinessObjectX() {};

        public String toString() {
            return "BusinessObjectX = [ id=" + id + " name=" + name + "]";
        }
    }

    static class BusinessObjectY {
        public String maker;
        public String model;

        public BusinessObjectY() {};

        public String toString() {
            return "BusinessObjectY = [ maker=" + maker + " model=" + model + "]";
        }
    }


    static class Location {
        public double latitude;
        public double longitude;
        public double altitude;

        public Location() {};

        public String toString() {
            return "Location = [ latitude=" + latitude + " longitude=" + longitude
            + " altidude=" + altitude + "]";
        }

Imagine that application has two methods that accept those objects, make updates to them via some imaginary process then returns a response. One method actionX() returns the modified input object. The other method modifies the input object but returns a Location object. The method for X might look like this:

    public static String actionX(BusinessObjectX objectX) {
        thisLog.info("asyncActionX() - starting / doing \"work\" for 7 seconds");
        try {
            TimeUnit.SECONDS.sleep(7);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("asyncActionA() - thread was interrupted during sleep");
        }
        thisLog.info("asyncActionX() - setting current datetime in delta object");
        objectX.id=2112;
        objectX.name="Rush";
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return "Permanent Waves";
    }

The method for Y might look like this:

    public static Location actionY(BusinessObjectY objectY) {
        thisLog.info("asyncActionY() - starting / doing \"work\" for 3 seconds");
        objectY.maker="Toyota";
        objectY.model="4Runner";
        try {
            TimeUnit.SECONDS.sleep(3);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("asyncActionY() - thread was interrupted during sleep");
        }
        Location result = new Location();
        result.latitude=39.9999;
        result.longitude=-70.999;
        result.altitude=140.888;
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return result;
    }

If the application calls those two tasks in sequence, that calling code might look like this.

        thisLog.info("============= SYNCHRONOUS / SEQUENTIAL PROCESSING (BAD) ========");

        // initialize input objects with some values
        BusinessObjectX testX = new BusinessObjectX();
        testX.id=42;
        testX.name="Yes";
        BusinessObjectY testY = new BusinessObjectY();
        testY.maker="Ford";
        testY.model="F150";        

        // call these methods serially, sequentially and show inputs can be altered and
        // results are returned after 10 seconds
        String testA = actionX(testX);
        thisLog.info("asyncActionA() completed - testA = " + testA.toString());
        Location testB = actionY(testY);
        thisLog.info("asyncActionB() completed - testB = " + testB.toString());

Notice what happens from reviewing the log messages generated by these calls:

2025-09-15 13:05:38:144 [main] INFO FUTURE - ====== SYNCHRONOUS / SEQUENTIAL PROCESSING (BAD) ========
2025-09-15 13:05:38:145 [main] INFO FUTURE - actionX() - starting / doing "work" for 7 seconds
2025-09-15 13:05:45:146 [main] INFO FUTURE - actionX() - setting current datetime in delta object
2025-09-15 13:05:45:148 [main] INFO FUTURE - asyncActionA() completed - testA = Permanent Waves
2025-09-15 13:05:45:148 [main] INFO FUTURE - actionY() - starting / doing "work" for 3 seconds
2025-09-15 13:05:48:159 [main] INFO FUTURE - asyncActionB() completed - testB = Location = [ latitude=39.9999 longitude=-70.999 altidude=140.888]

The two methods are invoked sequentially, as expected. The first takes 7 seconds as expected. The second takes 3 seconds as expected. It takes 10 seconds overall to get both results as expected. The problem is that for TEN SECONDS, the larger system could do nothing while waiting for those tasks to complete even though they were waiting on fake sleep calls.


CompletableFuture In Theory

The CompletableFuture class was added in Java 8 released in March 2014 as a means of providing a standardized approach for shunting potentially long-lived work sequences to alternate execution threads within a JVM. In concept, a CompletableFuture (henceforth a "future" for brevity and readability) acts much like a ticket in a fast food restaurant that is generated by the cashier and transmitted to the kitchen to trigger preparation and assembly of an ordered meal. In a restaurant, a "grill ticket" assists with these tasks:

  • documents the requesting register or unique customer identifier
  • identifies the work required in the kitchen (burger, no onions, large fries, large shake)
  • allows the cashier to proceed with other tasks for the current customer such as payment
  • allows that customer to step aside for the remaining wait while the register handles another customer
  • when the kitchen work is returned, the work is paired with the ID, the ID is read by the cashier and the output is picked up by the customer who proceeds with the next task

The CompletableFuture class fills a similar role for the Java JVM running an application. It provides these capabilities:

  • assigns a unique identifier for a task defined by a block of code
  • specifies the type of result expected to be returned by that code
  • maps the unique identifier back to the parent program asking for the asynchronous work
  • handles the future request to a separate thread manager within the JVM
  • allows that thread manager to find an available thread and queue up the work
  • watches the response queue from the thread manager for a result for that future
  • executes any other handling logic specified by the future on the result
  • resumes the parent program process and passes it the result

There are three distinct categories of methods provided by CompletableFuture for the various phases of orchestration work.

  • those used to identify the work needing asynchronous processing -- the .supplyAsyc(), .onAll() and .onAny() methods
  • those supplying special exception or timeout criteria to the future -- the .exceptionally() and .completeOnTimeout() methods
  • those that surrender the current execution thread to the JVM and pause processing in the parent process until the JVM sees a response to the future -- the .join(), .get() and .thenApply() methods

There ARE more methods in the class but the examples below using just these will provide a productive introduction for using them in a real application.


CompletableFuture In Practice

The sections below provide working examples of uses of CompletableFuture. The logic in these examples was explicitly coded to optimize for all of these considerations:

  1. Being as visually brief as possible so the structure of the techniques isn't lost amid 45 lines of code
  2. Showing how variables can be passed in through a future and how changes within the future are still reflected in the passed object upon return
  3. Generating logs at key points to illustrate where control is changing hands from the parent code block to the side-bar future threads and back
  4. Allow direct comparisons between these approaches to illustrate where calling conventions need to be altered based on how a future is being created or where exception handling will be implemented.
  5. Allowing individual examples to be pasted into a test project with as little modification while still yielding working code.

All of these examples were wrapped in a single Java class named Main in package com.mdhlabs.future with the following import statements:

import org.slf4j.*;
import java.lang.Thread;
import java.text.SimpleDateFormat;
import java.util.Locale;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.CancellationException;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;
import java.util.concurrent.ExecutionException;

The full source of the entire example source code is included as an appendix at the bottom of this post for convenience.


Making A Single Asynchronous Call

Executing a single block of code asynchronously is the Hello World use case for CompletableFuture. The process involves these coding steps:

  1. create a new object of type CompletableFuture by using the class method .supplyAsync() method to provide the code to be executed
  2. use one of the future's completion methods (.join(), .get(), .thenApply() for examples) to hand that CompletableFuture request to the JVM to request a different thread to run that block and hand the main thread back to the JVM to run other work

In this example, the code to be run asynchronously is a call to actionX(). Since that method already exists as a standalone method, the lambda call within the .supplyAsync() call just provides a return of that method. After providing the code to execute, a variable assignment calls the future's .join() method to hand the execute thread back to the JVM and await a response being signaled.

        thisLog.info("====== ASYNCHRONOUS PROCESSING  ========================");
        testX.id=42;
        testX.name="Yes";
        testY.maker="Ford";
        testY.model="F150";
        thisLog.info("Initiating actionX as a single CompletableFuture");
        CompletableFuture<String> singleFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("calling singleFuture.join() to wait wihtout blocking for completion");
        String singleResult = singleFuture.join();
        thisLog.info("after singleFuture completion: singleResult="+singleResult);

Now compare the sequential flow of the statements above with the log messages below when the code runs:

2025-09-15 17:37:25:326 [main] INFO FUTURE - ====== ASYNCHRONOUS PROCESSING  ========================
2025-09-15 17:37:25:326 [main] INFO FUTURE - Initiating actionX as a single CompletableFuture
2025-09-15 17:37:25:331 [main] INFO FUTURE - calling singleFuture.join() to wait wihtout blocking for completion
2025-09-15 17:37:25:331 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - starting / doing "work" for 7 seconds
2025-09-15 17:37:32:332 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - setting current datetime in delta object
2025-09-15 17:37:32:332 [main] INFO FUTURE - after singleFuture completion: singleResult=Permanent Waves

Note that the logs generated within actionX() didn't come from the Main class. They came from some other JVM class that is handling the actual execution of the logic in actionX() on a different thread than Main. That's exactly the goal. Also note that the log message prior to the invocation of singleFuture.join() is generated BEFORE logs from the execution of actionX() began appearing. When a future is created, execution of the work referenced within is not initiated until one of the completion methods (here the .join() method) is called. At THAT point, the JVM actually requests the separate thread for the future work and halts additional processing of the next statement and hands the execute thread back to the JVM. When the future work returns a result or an exception, that future result is mapped back to the future object that initiated it, the execution state of that future is looked up and when a thread becomes available, the JVM returns to the program state to resume execution.


Making Parallel Asynchronous Calls

A key benefit of the CompletableFuture mechanism is the ability to combine two or more "child" futures into a "parent" future which returns a result from the children based on how the parent was created. Two methods are provided for two use cases:

  • method .allOf() - returns when ALL specified children have completed via result or exception
  • method .anyOf() - returns when the FIRST child future returns a result or exception

When the allOf() method is used, ALL of the futures specified in creating the new future WILL be executed to their normal or exceptional completion. That new "allOf" future itself will not house the individual results or exceptions of the original futures, those have to be accessed via their individual future objects.

When the anyOf() method is used, the response or exception generated by whichever future is FIRST to complete is returned. The futures listed in the anyOf() method do not have to return the same result type so the call to thenAccept() or get() or join() of the anyOf() future has to determine which response type came back before processing the result.

Because of these complexities, the CompletableFuture class does NOT serve as the basis for implementing a complex, long lived task orchestration engine. There is no support for the concept of a transaction that allows ParentFuture to kick off ChildFutureA, ChildFutureB and ChildFutureC over the course of seconds or minutes, encounter a failure from ChildFutureB and roll back work initiated by A or C. CompletableFuture is ONLY appropriate for removing web service calls or "callback" invocations out of GUI execution threads.

In this example, both actionX() and actionY() are launched as futures, then a third future is created using .allOf() to specify the two child futures as a set, then a wait on the two child futures is triggered when the .join() method of the parent future is called. After a response is detected for the parent future, its .thenRun() method is used to collect information from the two children to formulate a final result returned to the application.

        thisLog.info("====== PARALLEL FUTURES ================================");
        testX.id=42;
        testX.name="Yes";
        testY.maker="Ford";
        testY.model="F150";  
        thisLog.info("Initiating actionX via CompletableFuture");
        CompletableFuture<String> xFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("Initiating actionY via CompletableFuture");
        CompletableFuture<Location> yFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionY(testY);
            }
        );
        thisLog.info("Creating resultFuture CompletableFuture waiting on aFuture and bFuture");
        CompletableFuture<Void> resultFuture = CompletableFuture.allOf(xFuture, yFuture);
        thisLog.info("invoking resultFuture.join() - waiting for x and y to complete");
        resultFuture.join();

        resultFuture.thenRun( () -> 
           {
               thisLog.info("resultFuture.thenRun() - combining async results from aFuture and bFuture");
               // statements can reference variables passed TO the async methods through the CompletableFuture
               thisLog.info("testA = " + testA.toString());
               thisLog.info("testB = " + testB.toString());
               // statements can reference response values returned through the CompletableFutures
               String xFutureResponse = xFuture.join();
               Location yFutureResponse = yFuture.join();
               thisLog.info("xFutureResponse <String> = " + xFutureResponse);
               thisLog.info("yFutureResponse <Location> = " + yFutureResponse);
           }
        );

When combining futures into a new parent future, the type of the parent future is always specified as <Void>. When the parent future returns, logic provided to its .thenRun() method must individually collect the results or exceptions from the child futures and decide how to merge them together into a final result.

Here are log messages generated when this code runs.

2025-09-15 17:37:32:333 [main] INFO FUTURE - ====== PARALLEL FUTURES ================================   
2025-09-15 17:37:32:333 [main] INFO FUTURE - Initiating actionX via CompletableFuture
2025-09-15 17:37:32:334 [main] INFO FUTURE - Initiating actionY via CompletableFuture
2025-09-15 17:37:32:334 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - starting / doing "work" for 7 seconds
2025-09-15 17:37:32:335 [main] INFO FUTURE - Creating resultFuture CompletableFuture waiting on aFuture and bFuture
2025-09-15 17:37:32:335 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - actionY() - starting / doing "work" for 3 seconds
2025-09-15 17:37:32:335 [main] INFO FUTURE - invoking resultFuture.join() - waiting for x and y to complete
2025-09-15 17:37:39:335 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - setting current datetime in delta object
2025-09-15 17:37:39:342 [main] INFO FUTURE - resultFuture.thenRun() - combining async results from aFuture and bFuture
2025-09-15 17:37:39:343 [main] INFO FUTURE - testA = Permanent Waves
2025-09-15 17:37:39:343 [main] INFO FUTURE - testB = Location = [ latitude=39.9999 longitude=-70.999 altidude=140.888]
2025-09-15 17:37:39:343 [main] INFO FUTURE - xFutureResponse <String> = Permanent Waves
2025-09-15 17:37:39:344 [main] INFO FUTURE - yFutureResponse <Location> = Location = [ latitude=39.9999 longitude=-70.999 altidude=140.888]

Note that even though both the A and B tasks were executed and took their normal 7 and 3 seconds, execution of both tasks BEGAN at exactly the same time and the overall process only had to wait 7 seconds for the A task to complete. The B task was already done and response pending with resultFuture when the response for the A task was returned.


Setting Timeout Limits

Moving processing to a background thread to avoid locking up the core GUI execution thread is useful but it is still possible for a remote process to lock up for far longer periods which will still strain local resources. To set an upper bound on how long a call should wait for a response, a future can be wrapped with a second future that specifies a timeout limit and a default value to return. Here's sample code illustrating this process.

        thisLog.info("======  TIMEOUT HANDLING ================================");
        thisLog.info("Initiating actionX via unlimitedXFuture ");
        CompletableFuture<String> unlimitedXFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("creating limitedAFuture via .completeOnTimeout()");
        CompletableFuture<String> limitedXFuture = 
            unlimitedXFuture.completeOnTimeout("TIMEOUTDEFAULT", 6, TimeUnit.SECONDS); 
        thisLog.info("starting non-blocking wait on limitedXFuture");
        String limitedResult = limitedXFuture.join();
        thisLog.info("limitedResult from limitedXFuture = " + limitedResult);

Here are the logs generated by the code.

025-09-15 17:37:39:344 [main] INFO FUTURE - ======  TIMEOUT HANDLING ================================
2025-09-15 17:37:39:344 [main] INFO FUTURE - Initiating actionX via unlimitedXFuture
2025-09-15 17:37:39:346 [main] INFO FUTURE - creating limitedAFuture via .completeOnTimeout()
2025-09-15 17:37:39:346 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - starting / doing "work" for 7 seconds
2025-09-15 17:37:39:350 [main] INFO FUTURE - starting non-blocking wait on limitedXFuture
2025-09-15 17:37:45:352 [main] INFO FUTURE - limitedResult from limitedXFuture = TIMEOUTDEFAULT

Handling Exceptions

The examples so far have avoided the extra complexity of exceptions being returned from the code being executed asynchronously. These patterns might allow initial functionality in a project to be off-loaded to a background thread as desired and solve an immediate problem. Eventually, however, "child" code will inevitably encounter an exception for a null pointer, some network timeout, division by zero, etc. It it LIKELY that it is NOT WORTH gold-plating a solution to perfectly "cure" these corner cases but it IS likely worth ensuring such calls are wrapped with enough handling to safely absorb these exceptions without bubbling them back to the JVM and crashing the application.

The approach to be taken to provide exception handling depends upon where any existing exception handling might be performed in the existing target code and where it is easiest to add new exception handling for the original business need and the CompletableFuture mechanism being added. These are the obvious choices:

  1. Containing exceptions within the target block (here, with the methods such as actionX() )
  2. Handling the uncaught exceptions of the target using logic within the .supplyAsync() block invoking the target block
  3. Altering the target block to throw required exceptions then adding logic in the the .supplyAsync() block invoking the target block
  4. Supplying the future with explicit code to execute when exceptions are returned using the .exceptionally() method.

Regardless of WHERE exception handling is added, logic within that extra handling must choose between trying to intercept an exception and revert to returning a valid business object response OR tweaking / modifying the exception and using throw to push the exception on to an outer parent process. These examples will illustrate ensuring a default business object is returned.


Handling Exceptions within .supplyAsync()

In this example, the "supplyAsync()" code isn't the original target code such as the logic in actionX(), the "supplyAsync()" code is the extra wrapper code in the lambda function that is calling actionX(). Visually, it is cleaner if the lambda block only has a single line but the lambda mechanism itself doesn't care how many lines of code are in that block. This allows extra exception handling to be added WITHOUT modifying the actual actionX() code which may be preferable in many scenarios.

In the example, the actionX() method is the real "target" being executed but all of the other code highlighted in green is the "async" code supplied to the future. By making the main call to actionX() within a try / catch block, any exceptions (caught/thrown or uncaught) by actionX() can be intercepted upon return to the future and an alternate response value for the Location object returned from the future.

        thisLog.info("====== EXCEPTION PROCESSING VIA .supplyAsync() CODE =====");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating actionYRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureTypedException = CompletableFuture.supplyAsync(  () -> 
            {
                try {
                    return actionYRawException(testY);
                    }
                catch (Exception theE) {
                   Location newresult = new Location();
                   newresult.latitude=99;
                   newresult.longitude=-99;
                   newresult.altitude=999;
                   thisLog.info("catch() in supplyAsync() block returning rigged Location=" + newresult);
                   return newresult;   // return this result as the outer result
                   }
            }
        );

        thisLog.info("calling inner future .join() for non-blocking wait to completion");
        Location locationResult2 = uncaughtFutureTypedException.join();
        thisLog.info("inner future .join() result = " + locationResult2);
        thisLog.info("final value of testy=" + testY);

Here is what the log output looks like when this code executes:

2025-09-15 17:37:45:353 [main] INFO FUTURE - ====== EXCEPTION PROCESSING VIA .supplyAsync() CODE =====
2025-09-15 17:37:45:359 [main] INFO FUTURE - input value of testy=BusinessObjectY = [ maker=Ford model=F150]
2025-09-15 17:37:45:359 [main] INFO FUTURE - Initiating actionYRawException via CompletableFuture
2025-09-15 17:37:45:360 [main] INFO FUTURE - calling inner future .join() for non-blocking wait to completion
2025-09-15 17:37:45:360 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - actionYRawException() - starting / doing "work" for 3 seconds
2025-09-15 17:37:46:348 [ForkJoinPool.commonPool-worker-1] INFO FUTURE - actionX() - setting current datetime in delta object
2025-09-15 17:37:48:362 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - catch() in supplyAsync() block returning rigged Location=Location = [ latitude=99.0 longitude=-99.0 altidude=999.0]
2025-09-15 17:37:48:365 [main] INFO FUTURE - inner future .join() result = Location = [ latitude=99.0 longitude=-99.0 altidude=999.0]
2025-09-15 17:37:48:366 [main] INFO FUTURE - final value of testy=BusinessObjectY = [ maker=Toyota model=4Runner]

Handling Exceptions using .exceptionally()

The .exceptionally() method of a future provides a different way of supplying exception handling code to the future to run when it sees an exception returned from child code. Code supplied via .exceptionally() is ONLY executed if a child process returns an exception to the future. For most scenarios, this is the cleanest approach for handling exceptions since the naming convention of the methods adds to code clarity and documenting intent.

In the example here, a future that calls actionYRawException() method is invoked without any extra guardrails in the supplyAsync() wrapper code. A second future is created by calling the .exceptionally() method of the first future and supplying the desired exception handling code as a lambda to that method. The wait for the result is triggered by calling the .join() of the second future then the result is available to the parent code.

        thisLog.info("====== EXCEPTION VIA INNER FUTURE .exceptionally() ===");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating actionYRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        CompletableFuture<Location> caughtFutureRawException = uncaughtFutureRawException.exceptionally( exception -> 
            {
            thisLog.error("caughtFutureRawException.exceptionally() -- " + exception);
            Location newresult = new Location();
            newresult.latitude=45;
            newresult.longitude=-75;
            newresult.altitude=999;
            thisLog.info("exceptionally() returning rigged Location=" + newresult);
            return newresult;   // return this result as the outer result
            }
        );

        thisLog.info("calling outer future .join() for non-blocking wait until completion");
        Location locationResult3 = caughtFutureRawException.join();
        thisLog.info("outer result from outer future - Location="+ locationResult3);
        thisLog.info("final value of testy=" + testY);

Handling Exceptions using .handle()

The .handle() method of a future differs from the .exceptionally() method because it ALWAYS executes when the future's original code completes whether a normal result or exception is returned. The response flow of the .handle() call means the logic provided must handle BOTH successful responses AND exceptions. If logic is not included to return the preliminary result object for success, nothing will be returned. This seems to require duplicating code unnecessarily and can lead to confusion during testing and makes the .get() call not terribly favored among developers.

        thisLog.info("====== EXCEPTION PROCESSING VIA INNER .handle() ========");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating uncaughtFutureRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException2 = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        CompletableFuture<Location> caughtFutureRawException2
            = uncaughtFutureRawException2.handle( (result, exception) -> 
            {
                if (exception !=null) {
                   thisLog.error("caughtFutureRawException.exceptionally() -- " + exception);
                   Location newresult = new Location();
                   newresult.latitude=45;
                   newresult.longitude=-75;
                   newresult.altitude=999;
                   thisLog.info(".handle() returning rigged Location=" + newresult);
                   return newresult;   // return this result as the outer result
                }
                else {
                   thisLog.info(".handle() returning original Location=" + result);
                   return result;
                }
            }
        );

        thisLog.info("calling outer future .join() for non-blocking wait until completion");
        Location locationResult4 = caughtFutureRawException2.join();
        thisLog.info("final .get() response= "+ locationResult4);
        thisLog.info("final value of testy=" + testY);

Here are the resulting logs.

2025-09-15 17:37:51:380 [main] INFO FUTURE - ====== EXCEPTION PROCESSING VIA INNER .handle() ========
2025-09-15 17:37:51:380 [main] INFO FUTURE - input value of testy=BusinessObjectY = [ maker=Ford model=F150]
2025-09-15 17:37:51:380 [main] INFO FUTURE - Initiating uncaughtFutureRawException via CompletableFuture
2025-09-15 17:37:51:381 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - actionYRawException() - starting / doing "work" for 3 seconds
2025-09-15 17:37:51:382 [main] INFO FUTURE - calling outer future .join() for non-blocking wait until completion
2025-09-15 17:37:54:383 [ForkJoinPool.commonPool-worker-2] ERROR FUTURE - caughtFutureRawException.exceptionally() -- java.util.concurrent.CompletionException: java.lang.ArithmeticException: / by zero
2025-09-15 17:37:54:385 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - .handle() returning rigged Location=Location = [ latitude=45.0 longitude=-75.0 altidude=999.0]
2025-09-15 17:37:54:387 [main] INFO FUTURE - final .get() response= Location = [ latitude=45.0 longitude=-75.0 altidude=999.0]
2025-09-15 17:37:54:388 [main] INFO FUTURE - final value of testy=BusinessObjectY = [ maker=Toyota model=4Runner]

Handling Exceptions using .get()

The .get() method of a future is code to return the future's response object for the happy path but return one of three types of exceptions reflecting any error status of the future. Since these exception types are formally thrown by the method declaration, the call to .get() must be performed within a try / catch / finally block.

In the example below, the actionYRawException() method is called by the future. The .get() method of that only future is called within a try block that includes catch clauses for these types of exceptions:

CancelationException returned if prior code has called the .cancel() method of this future
ExecutionException returned if an exception during execution of the wrapped future was returned
InterrupedException returned if the current thread waiting on this future itself has been interrupted

Here is the sample code.

        thisLog.info("====== EXCEPTION PROCESSING VIA INNER .get() ============");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating uncaughtFutureRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException3 = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        thisLog.info("calling outer future .get() within try/catch to wait without blocking for completion");
        Location locationResult5=null;
        try {
           locationResult5 = uncaughtFutureRawException3.get();
        }
        catch (CancellationException ie) {
           thisLog.error("exception " + ie);        
        }
        catch (ExecutionException ee) {
           thisLog.error("exception " + ee);
           thisLog.error("setting rigged Location result -- 55 / 55 / 55");
           locationResult5 = new Location();
           locationResult5.latitude=55.0;
           locationResult5.longitude=55.0;
           locationResult5.altitude=55.0;
        }
        catch (InterruptedException ie) {
           thisLog.error("exception " + ie);        
        }

        thisLog.info("final result from .get() or catch: location="+ locationResult5);
        thisLog.info("final value of testy=" + testY);

Here are the resulting logs from executing that code.

2025-09-15 17:37:54:389 [main] INFO FUTURE - ====== EXCEPTION PROCESSING VIA INNER .get() ============
2025-09-15 17:37:54:390 [main] INFO FUTURE - input value of testy=BusinessObjectY = [ maker=Ford model=F150]
2025-09-15 17:37:54:391 [main] INFO FUTURE - Initiating uncaughtFutureRawException via CompletableFuture
2025-09-15 17:37:54:392 [main] INFO FUTURE - calling outer future .get() within try/catch to wait without blocking for completion
2025-09-15 17:37:54:392 [ForkJoinPool.commonPool-worker-2] INFO FUTURE - actionYRawException() - starting / doing "work" for 3 seconds
2025-09-15 17:37:57:394 [main] ERROR FUTURE - exception java.util.concurrent.ExecutionException: java.lang.ArithmeticException: / by zero
2025-09-15 17:37:57:395 [main] ERROR FUTURE - setting rigged Location result -- 55 / 55 / 55
2025-09-15 17:37:57:396 [main] INFO FUTURE - final result from .get() or catch: location=Location = [ latitude=55.0 longitude=55.0 altidude=55.0]
2025-09-15 17:37:57:397 [main] INFO FUTURE - final value of testy=BusinessObjectY = [ maker=Toyota model=4Runner]

Appendix - Full Main.java Source

The entire source code of the Main.java class file containing all of these illustrations is provided below for convenience.

package com.mdhlabs.future;

import org.slf4j.*;
import java.lang.Thread;
import java.text.SimpleDateFormat;
import java.util.Locale;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.CancellationException;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;
import java.util.concurrent.ExecutionException;


public class Main {


    private static Logger thisLog = LoggerFactory.getLogger("FUTURE");


    static class BusinessObjectX {
        public int id;
        public String name;

        public BusinessObjectX() {};

        public String toString() {
            return "BusinessObjectX = [ id=" + id + " name=" + name + "]";
        }
    }

    static class BusinessObjectY {
        public String maker;
        public String model;

        public BusinessObjectY() {};

        public String toString() {
            return "BusinessObjectY = [ maker=" + maker + " model=" + model + "]";
        }
    }

    static class Location {
        public double latitude;
        public double longitude;
        public double altitude;

        public Location() {};

        public String toString() {
            return "Location = [ latitude=" + latitude + " longitude=" + longitude + " altidude=" + altitude + "]";
        }
    }


    public static String actionX(BusinessObjectX objectX) {
        thisLog.info("actionX() - starting / doing \"work\" for 7 seconds");
        try {
            TimeUnit.SECONDS.sleep(7);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("actionX() - thread was interrupted during sleep");
        }
        thisLog.info("actionX() - setting current datetime in delta object");
        objectX.id=2112;
        objectX.name="Rush";
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return "Permanent Waves";
    }


    public static Location actionY(BusinessObjectY objectY) {
        thisLog.info("actionY() - starting / doing \"work\" for 3 seconds");
        objectY.maker="Toyota";
        objectY.model="4Runner";
        try {
            TimeUnit.SECONDS.sleep(3);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("actionY() - thread was interrupted during sleep");
        }
        Location result = new Location();
        result.latitude=39.9999;
        result.longitude=-70.999;
        result.altitude=140.888;
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return result;
    }



    public static Location actionYRawException(BusinessObjectY objectY) {
        thisLog.info("actionYRawException() - starting / doing \"work\" for 3 seconds");
        objectY.maker="Toyota";   // alter these to show we have access to input parameters
        objectY.model="4Runner";  // alter these to show we have access to input parameters
        try {
            TimeUnit.SECONDS.sleep(3);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("asyncActionB() - thread was interrupted during sleep");
        }
        Location result = new Location();
        // this will throw an uncaught exception for divide by zero.
        int denominator = 0;
        int numerator = 20/denominator;
        thisLog.info("Attempted to force divide by zero exception " + numerator);
        result.latitude=39.9999/0;
        result.longitude=-70.999;
        result.altitude=140.888;
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return result;
    }



    public static Location actionYTypedException(BusinessObjectY objectY) throws Exception {
        thisLog.info("actionYTypedException() - starting / doing \"work\" for 3 seconds");
        objectY.maker="Toyota";   // alter these to show we have access to input parameters
        objectY.model="4Runner";  // alter these to show we have access to input parameters
        try {
            TimeUnit.SECONDS.sleep(3);            
        }
        catch (InterruptedException theE) {
            Thread.currentThread().interrupt();
            thisLog.info("asyncActionB() - thread was interrupted during sleep");
        }
        Location result = new Location();
        // this will throw an uncaught exception for divide by zero.
        int denominator = 0;
        int numerator = 20/denominator;
        thisLog.info("Attempted to force divide by zero exception " + numerator);
        result.latitude=39.9999/0;
        result.longitude=-70.999;
        result.altitude=140.888;
        // this return value is returned as the result of the CompletableFuture that
        // wrapped this call
        return result;
    }


    public static void main(String[] args) {
        System.out.println("Example coding for CompletableFuture operations!");

        thisLog.info("====== SYNCHRONOUS / SEQUENTIAL PROCESSING (BAD) ========");

        // initialize input objects with some values
        BusinessObjectX testX = new BusinessObjectX();
        testX.id=42;
        testX.name="Yes";
        BusinessObjectY testY = new BusinessObjectY();
        testY.maker="Ford";
        testY.model="F150";        

        // call these methods serially, sequentially and show inputs can be altered and 
        // results are returned after 10 seconds
        String testA = actionX(testX);
        thisLog.info("actionX() completed - testA = " + testA.toString());
        Location testB = actionY(testY);
        thisLog.info("actionY() completed - testB = " + testB.toString());

        // call A and B within CompletableFuture wrappers, then combine those wrappers in 
        // an outer CompletableFuture wrapper that waits for both asynchronously

        // reset the input objects back to original values
      

        // demonstrate a single CompletableFuture call and non-blocking wait

        thisLog.info("====== ASYNCHRONOUS PROCESSING  ========================");
        testX.id=42;
        testX.name="Yes";
        testY.maker="Ford";
        testY.model="F150";
        thisLog.info("Initiating actionX as a single CompletableFuture");
        CompletableFuture<String> singleFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("calling singleFuture.join() to wait wihtout blocking for completion");
        String singleResult = singleFuture.join();
        thisLog.info("after singleFuture completion: singleResult="+singleResult);

        // reset the input objects back to original values



        // demonstrate executing two CompletableFuture calls in parallel and non-blocking wait
        thisLog.info("====== PARALLEL FUTURES ================================");
        testX.id=42;
        testX.name="Yes";
        testY.maker="Ford";
        testY.model="F150";  
        thisLog.info("Initiating actionX via CompletableFuture");
        CompletableFuture<String> xFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("Initiating actionY via CompletableFuture");
        CompletableFuture<Location> yFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionY(testY);
            }
        );
        thisLog.info("Creating resultFuture CompletableFuture waiting on aFuture and bFuture");
        CompletableFuture<Void> resultFuture = CompletableFuture.allOf(xFuture, yFuture);
        thisLog.info("invoking resultFuture.join() - waiting for x and y to complete");
        resultFuture.join();

        resultFuture.thenRun( () -> 
           {
               thisLog.info("resultFuture.thenRun() - combining async results from aFuture and bFuture");
               // statements can reference variables passed TO the async methods through the CompletableFuture
               thisLog.info("testA = " + testA.toString());
               thisLog.info("testB = " + testB.toString());
               // statements can reference response values returned through the CompletableFutures
               String xFutureResponse = xFuture.join();
               Location yFutureResponse = yFuture.join();
               thisLog.info("xFutureResponse  = " + xFutureResponse);
               thisLog.info("yFutureResponse  = " + yFutureResponse);
           }
        );


        // demonstrate timeout / default handling
        thisLog.info("======  TIMEOUT HANDLING ================================");
        thisLog.info("Initiating actionX via unlimitedXFuture ");
        CompletableFuture<String> unlimitedXFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return actionX(testX);
            }
        );
        thisLog.info("creating limitedAFuture via .completeOnTimeout()");
        CompletableFuture<String> limitedXFuture = 
            unlimitedXFuture.completeOnTimeout("TIMEOUTDEFAULT", 6, TimeUnit.SECONDS); 
        thisLog.info("starting non-blocking wait on limitedXFuture");
        String limitedResult = limitedXFuture.join();
        thisLog.info("limitedResult from limitedXFuture = " + limitedResult);



/*        
        // demonstrate behavior with an uncaught exception in a single future
        thisLog.info("======================= UNCAUGHT EXCEPTION - THIS FAILS ========");
        thisLog.info("This approach doesn't handle the exception a) inside the method,");
        thisLog.info("b) in the supplyAsync() wrapper calling it, or c) by using ");
        thisLog.info(".handle() or .exceptionally() methods of CompletableFuture");
        thisLog.info("so the exception will crash the entire program.");
        thisLog.info("Initiating asyncActionException via CompletableFuture");
        CompletableFuture<Location> uncaughtFuture = CompletableFuture.supplyAsync(  () -> 
            {
               return asyncActionRawException(testY);
            }
        );

        thisLog.info("calling uncaughtFuture.join() to wait wihtout blocking for completion");
        Location locationResult = uncaughtFuture.join();
        thisLog.info("after uncaughtFuture.join() has returned with result or exception "+ locationResult);
*/


        // dthis code illustrates how an inner/outer CompletableFuture pair invokes code
        // that returns an Exception and catches that thrown exception to convert to a
        // approropriate default value to return to business logic
        thisLog.info("====== EXCEPTION PROCESSING VIA .supplyAsync() CODE =====");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating actionYRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureTypedException = CompletableFuture.supplyAsync(  () -> 
            {
                try {
                    return actionYRawException(testY);
                    }
                catch (Exception theE) {
                   Location newresult = new Location();
                   newresult.latitude=99;
                   newresult.longitude=-99;
                   newresult.altitude=999;
                   thisLog.info("catch() in supplyAsync() block returning rigged Location=" + newresult);
                   return newresult;   // return this result as the outer result
                   }
            }
        );

        thisLog.info("calling inner future .join() for non-blocking wait to completion");
        Location locationResult2 = uncaughtFutureTypedException.join();
        thisLog.info("inner future .join() result = " + locationResult2);
        thisLog.info("final value of testy=" + testY);



        // this code illustrates how an inner/outer CompletableFuture pair invokes code
        // that isn't declared to throw exceptions but might anyway, requiring this code
        // to catch the Exception and synthesize an alternate response value instead
        // of propgating an uncaught exception
        thisLog.info("====== EXCEPTION VIA INNER FUTURE .exceptionally() ===");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating actionYRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        CompletableFuture<Location> caughtFutureRawException = uncaughtFutureRawException.exceptionally( exception -> 
            {
            thisLog.error("caughtFutureRawException.exceptionally() -- " + exception);
            Location newresult = new Location();
            newresult.latitude=45;
            newresult.longitude=-75;
            newresult.altitude=999;
            thisLog.info("exceptionally() returning rigged Location=" + newresult);
            return newresult;   // return this result as the outer result
            }
        );

        thisLog.info("calling outer future .join() for non-blocking wait until completion");
        Location locationResult3 = caughtFutureRawException.join();
        thisLog.info("outer result from outer future - Location="+ locationResult3);
        thisLog.info("final value of testy=" + testY);



        thisLog.info("====== EXCEPTION PROCESSING VIA INNER .handle() ========");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating uncaughtFutureRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException2 = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        CompletableFuture<Location> caughtFutureRawException2
            = uncaughtFutureRawException2.handle( (result, exception) -> 
            {
                if (exception !=null) {
                   thisLog.error("caughtFutureRawException.exceptionally() -- " + exception);
                   Location newresult = new Location();
                   newresult.latitude=45;
                   newresult.longitude=-75;
                   newresult.altitude=999;
                   thisLog.info(".handle() returning rigged Location=" + newresult);
                   return newresult;   // return this result as the outer result
                }
                else {
                   thisLog.info(".handle() returning original Location=" + result);
                   return result;
                }
            }
        );

        thisLog.info("calling outer future .join() for non-blocking wait until completion");
        Location locationResult4 = caughtFutureRawException2.join();
        thisLog.info("final .get() response= "+ locationResult4);
        thisLog.info("final value of testy=" + testY);




        thisLog.info("====== EXCEPTION PROCESSING VIA INNER .get() ============");
        testY.maker="Ford";
        testY.model="F150";   
        thisLog.info("input value of testy=" + testY);
        thisLog.info("Initiating uncaughtFutureRawException via CompletableFuture");
        CompletableFuture<Location> uncaughtFutureRawException3 = CompletableFuture.supplyAsync(  () -> 
            {
                return actionYRawException(testY);
            }
        );
        thisLog.info("calling outer future .get() within try/catch to wait without blocking for completion");
        Location locationResult5=null;
        try {
           locationResult5 = uncaughtFutureRawException3.get();
        }
        catch (CancellationException ie) {
           thisLog.error("exception " + ie);        
        }
        catch (ExecutionException ee) {
           thisLog.error("exception " + ee);
           thisLog.error("setting rigged Location result -- 55 / 55 / 55");
           locationResult5 = new Location();
           locationResult5.latitude=55.0;
           locationResult5.longitude=55.0;
           locationResult5.altitude=55.0;
        }
        catch (InterruptedException ie) {
           thisLog.error("exception " + ie);        
        }

        thisLog.info("final result from .get() or catch: location="+ locationResult5);
        thisLog.info("final value of testy=" + testY);


        thisLog.info("====== END OF ALL DEMOS =================================");

    }
}

Thursday, April 24, 2025

Recovering From a Read-Only Linux Boot

Nothing strikes fear into the heart of a system administrator or end user like logging into a system and finding their content GONE. As storage has shifted from hard drives to FLASH memory and file systems have become more reliable, loss of data from random failures has become far less likely to occur. However, there are a variety of administrative mistakes -- beyond accidental erase / remove commands -- that can APPEAR to result in missing data. Given the overall reliability of underlying hardware, it is useful to understand scenarios that can produce symptoms of data loss and learn troubleshooting techniques that can identify root cause and allow them to be resolved.

This post illustrates an administrative mistake involving an installation of OpenSSL that broke key processes within a Linux system that resulted in all content of the system's /home directory DISAPPEARING, along with other directories tied to remote NFS and Samba volumes. The problem took about 90 minutes to diagnose and ended with zero actual data loss.


The Original Administrative Mistake

This problem started after a new installation of OpenSSL was added to a Linux host running Fedora 43 so the OpenSSL libraries could be referenced when compiling / building a version of Python from source. To make the library directory of the OpenSSL build easy to reference in the Python build, OpenSSL was built from source and installed at /opt/openssl341. To ensure the OpenSSL shared modules were usable in the Python build, the /opt/openssl341/lib64 directory was added to the dynamic linker configuration file at /etc/ld.so.conf and the linker configuration reloaded by running /sbin/ldconfig to pick up the new library directory.

This allowed Python to be compiled but the Fedora host was not restarted after updating the dynamic linker. This deferred recognition of the fact that the new OpenSSL library modules were incompatible with various operating system components that used its libcrypto.so module.


How The Mistake Broke the System Boot

Many different modules for handling user authorization, logging and file system management rely upon the libcrypto module to function. Under the covers, the SELinux (Security Enhanced Linux) layer had detected differences between the new libcrypto.so module seen via the dynnamic linker and the rest of the openssl installation the OS was using and BLOCKED all access to the libcrypto.so module. That caused numerous processes to fail during startup. The kernel saw these failures and altered the boot configuration to mount the system's primary drive read-only instead of read-write.

Since the unexpected version of libcrypto.so broke the User Database Service (systemd-userdbd.service), higher level operating system functions requiring that layer to be functioning to control access to other processes and resources failed and could not perform required functions. One of those functions involves displaying entries in the filesystem. As a result, attempting to list any directories within /home which are owned by non-root users didn't just return the directory information with weird integer ID values for the owner and group, such listings returned NOTHING. This gave the impression the content was GONE, rather than merely unreadable or unwriteable.

Of course, enforcement of user-based security is also crucial to creating links to external file systems using NFS and Samba. As a result, the directory /gitrepo linked to a remote NFS storage volume on a TrueNAS server was not connected, making it appear like all local git repository content had been lost. Another directory /smb used to backup non-source related content was also unable to connect making THAT content appear to have disappeared as well.

Luckily, the larger environment was configured with other Fedora and Windows systems with similar connections to those remote NFS and Samba volumes and all of those connectiosn worked, proving the content was present and had not been lost. That made it easier to focus on finding a cause that could be corrected without loss of data.


Finding the Underlying Fault

The first problem that became apparent with the failed boot is that some of the first common sources of diagnostics such as /var/log/warn or /var/log/last that identify events in the most recent boot had no new content. They couldn't because the entire machine volume had been mounted read-only. Instead, the journalctl command provided similar details and quickly pointed to openssl being involved.

The first set of log messages that pointed out a problem involved HUNDREDS of these messages that were generated around the time the new version of OpenSSL was first installed and the dynamic linker configuration updated the day before.

Apr 23 15:00:20 fedora1 systemd-userdbd[82099]: /usr/lib/systemd/systemd-userwork: error while loading shared libraries: libcrypto.so.3: failed to map segment from sha>
Apr 23 15:00:20 fedora1 systemd-userdbd[577]: Worker 82099 died with a failure exit status 127, ignoring.
Apr 23 15:00:20 fedora1 systemd-userdbd[82100]: /usr/lib/systemd/systemd-userwork: error while loading shared libraries: libcrypto.so.3: failed to map segment from sha>
Apr 23 15:00:20 fedora1 systemd-userdbd[577]: Worker 82100 died with a failure exit status 127, ignoring.

After jumping ahead with a search in the output of the journalctl command to the current time around the most recent failed boot, error messages like these were seen in the logs:

Apr 23 16:27:05 fedora1 setroubleshoot[199695]: SELinux is preventing systemd-hostnam from 
execute access on the file /opt/openssl341/lib64/libcrypto.so.3.

So these clearly identified that the libcrypto.so.3 module was at fault and the specific location of that module was the NEW OpenSSL installation just added the prior day. Correcting the problem required pointing the system away from the new OpenSSL installation. The existing OS installation binaries and libraries had not be altered, only bypassed via the system $PATH and the dynamic linker configuration. Rolling back should be straightforward.

Right? Maybe. Maybe not.


Disabling the Faulty OpenSSL Installation

Since the server host altered the file system configuration to mount the main volume read-only, the /etch/bashrc controlling the system's default $PATH and the /etc/ld.so.conf configuration controlling the dymamic linker could be SEEN but they could not be EDITED. In order to alter the files and hide the presence of the /opt/openssl341 directory, the boot command specified on the GRUB menu at boot had to be altered to explicitly force the volume to boot in rw mode rather than ro mode.

In this case, the Fedora machine was a virtual machine guest running under ProxMox. "Console" access wasn't provided by direct connection with a keyboard and monitor to a physical machine but instead by the "Console" function within the ProxMOx administrative GUI at http://192.168.99.2:8006/. That allowed access to the GRUB menu displayed during boot so the boot command could be edited. The actual boot command looked like this

root=UUID=7d825ab0-3b7b-44de-8c2e-8f0c97a5cefb ro rootflags=subvol=root rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau "

and was changed to this:

root=UUID=7d825ab0-3b7b-44de-8c2e-8f0c97a5cefb rw rootflags=subvol=root rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau "

With that alteration, the system booted with read-write access to the /etc directory allowing the broken OpenSSL installation path references to be removed from $PATH and the dynamic linker. It actually proved to be the change to the dynamic linker that corrected the fault and allowed the system to boot cleanly in read-write mode.

NOTE. After altering the boot configuration in GRUB to specify rw instead of ro, after the system rebooted successfully, the boot option returned to ro (read-only). How is the system working if the boot configuration is telling the system to start in read-only mode? By default, the boot process WILL mount the system disk in read-only so processes running as part of the initrd (initialization RAM disk) can examine the volume to see if it was dismounted cleanly at shutdown or needs a file system check scan run. That initrd logic will alter the access mode to read-write if no issues are found. Forcing the access mode to read-write will cause initrd to leave it read-write, even if lower level failures are found during the file system check. This allows read-write access when the full OS boots. Extreme caution is required any time a volume is forced to read-write mode, however.


Key Lessons Worth Re-Learning

When systems operate well for extended periods of time, it can be very easy to forget old best-practices and even easier to forget key diagnostic techniques required to correct issues. The following lessons are worth highlighting from this particular fire drill.

  1. Manage the OS Installation of OpenSSL Separately Than "User" Installations -- The OS installation of OpenSSL on Linux operating systems is crucial to MANY aspects of system startup and ongoing security. Changes that versions of binaries or shared libraries can trigger complex failures at reboot. If a different version of OpenSSL is needed for "user" purposes, build it into a user directory and alter user-specific environment settings to use that installation for end-user functions.
  2. Always Include a Reboot When Altering the OS OpenSSL Installation -- Anything that breaks the OS installation of OpenSSL can trigger these failures at reboot, rendering a machine potentially unreachable except by console, making it vastly more difficult to fix. If a reboot is performed IMMEDIATELY after updating the OS OpenSSL, the likely cause will be immediately obvious. If a reboot is NOT performed until days / weeks later and the system because unreachable or unusable, the failure will generate MUCH more confusion and take longer to troubleshoot and resolve.
  3. Treat Python Installations the Same as OpenSSL Installations -- Many Linux distributions have adopted Python for use in many of their package administration utilities and desktop related functions. Making changes to the "OS" intance of Python for use with user-level projects is NOT wise. Install "user" builds of Python as an end-user and update user-level $PATH settings to use the user instance instead of the system instance.
  4. Remember Partial Absence of Subdirectories Can be Security Related Rather than Hardware Related -- If a Linux machine boots in read-only mode, missing content tied to specific userids is NOT likely a hardware fault and is likely recoverable. DO NOT give up on the "lost" data and DO NOT confuse the possible recovery by immediately attempting to find replacement copies from other media. Try to cure the logical problem first and exhaust all possibilities.

Tuesday, February 4, 2025

Digital Oscilloscope Screen Captures

Any troubleshooting or design work involving audio equipment or digital logic is often sped up by using an oscilloscope to look at analog wave forms or analyze digital signals and alignment across a circuit. When documenting such troubleshooting and design work, being able to capture a signal trace on an oscilloscope is helpful in communicating a diagnosis or design. Most modern digital scopes provide a USB host port that allows a USB thumb drive to be plugged in and used as the destination in writing screen dumps to paste into other documents.

Since at least 2015, many digital scope makers have expanded beyond this simple approach to capturing screen images by implementing networking and VISA (Virtual Instrument Software Architecture) protocols developed by National Instruments that add significant automation and scripting capabilties to a variety of laboratory gear. APIs implementing these VISA standards have been implemented in a variety of languages including Python, C#, Java and .Net.

(You can see where this is going...)

This capability exists on virtually all scopes. The process of using it will be demonstrated using a Rigol DHO924S scope that operates atop the Android operating system and Python as a scripting language. However, these newer scopes also make screen captures possible without any scripting using functions built within browser interfaces. Both approaches will be shown. The browser appraoch is easier for rare, occassional use but the ability to script a capture allows it to be used within a larger process that might also be scripted.


Connecting to the Scope - USB or IP

Rigol scopes accept connections via a USB Device port tied to a laptop / desktop computer or via IP. Use of the USB interface might be preferable for selected tasks and seems logically preferable given that Rigol scopes do not (yet) have WiFi IP connectivity and a wired Ethernet connection may not always be close to where the scope is being used. However, communicating to a Rigol scope over USB requires installation of an application developed by Rigol called UltraSIgma whose user interface components were last altered around 2016 but visually appear to be coded using mid-1990s frameworks. Given the age of the software, it requires installation by a user with Admin privileges on Windows.


Finding the Scope's VISA Address

VISA libraries use identifiers in a specific format to identify a specific lab device. When accessing a Rigol scope via IP or USB, that identifer will take one of these forms:

TCPIP::192.168.99.29::INSTR
USB0::0x1AB1::0x044C::DHO9S254201528::INSTR

If IP connectivity is used, the VISA address will be visible in the Rigol scope's Utility sub-function as soon as the scope boots and pulls an IP address from the DHCP server. The screen will look like this:

Note that it IS possible to statically assign the IP so the scope obtains the same IP address consistently, avoiding the need to possible change this IP reference in the VISA address in the script. However, in most small networks, DHCP servers in gateway routers will typically re-assign the same IP to the same MAC unless they exhaust their available pool so changing IP addresses isn't often an issue.

If USB connectivity is used, the view in the scope will NOT be updated since a USB connection is not considered a "network" connection. Instead, the USB format VISA address can be identified in two ways. One way is to temporarly connect the scope to an IP network and surf to the scope's IP and gleen the USB address from the top level Rigol Web Control view (see a sample screen dump further below). The other way is to install the UltraSigma software package from Rigol to then allow its parent utility program to be run to display the VISA string.

The view displaying the USB VISA address looks like this in that parent application:

Since the USB designation may change based on which physical USB port on the computer is used and which USB controller is driving that physical port, this information must be discovered from the PC end. There's no way to predict it by looking at information in the scope's displays.

Using Python and pyvisa for Captures

With the scope connected via IP or USB and its VISA address identified, the logic required to make VISA calls to address the scope and trigger a screen capture are very simple in Python. First, two libraries are required which can be installed via these commands.


pip install -U pyvisa
pip install -U pyvisa-py

As of February 4, 2025, these will install version 1.14.1 of pyvisa and version 0.7.2 of pyvisa-py.

With those libraries installed, a simple script like the following will allow an output filename to be specified along with an option to include a datetime stamp like 20250204193059 in the filename.

import pyvisa import argparse import datetime # use argparse to parse arguments for filename and option datestamp parser = argparse.ArgumentParser( prog="capturescope", description='Captures screen dumps via VISA protocol from Rigol oscilloscope', epilog='syntax: capturescreen.py filename --timestamp' ) parser.add_argument('filename',help='name of file without extension to write') parser.add_argument('-t','--timestamp', help='adds yyyymmddhhmmss timestamp to filename',action='store_true') args = parser.parse_args() fullfilename = args.filename if args.timestamp: # need to get the current yyyymmddhhmmss timestamp now = datetime.datetime.now() yyyymmddhhmmss = now.strftime("%Y%m%d%H%M%S") fullfilename = fullfilename + '.' + yyyymmddhhmmss fullfilename = fullfilename + '.png' print("Writing screen capture to: ",fullfilename) # Connect to the oscilloscope rm = pyvisa.ResourceManager() # here is my scope's reference when connected via TCPIP scope = rm.open_resource('TCPIP::192.168.99.29::INSTR') # here is my scope's reference when connected via USB to my laptop # find this via Sigma Ultra app from Rigol or temporarily connect via IP # then surf to http://ipaddress # scope = rm.open_resource('USB0::0x1AB1::0x044C::DHO9S254201528::INSTR') # Set the timeout scope.timeout = 5000 # Get the screenshot screenshot = scope.query_binary_values(':DISP:DATA?', datatype='B') # Save the screenshot as a PNG file with open(fullfilename, 'wb') as f: f.write(bytes(screenshot)) # Close the connection scope.close()

With that script, anything present on the screen can be captured using commands like this:

c:\Docs\gitwork\labutils>python capturescope.py scopeaddress --timestamp
Writing screen capture to:  scopeaddress.20250203215700.png

c:\Docs\gitwork\labutils>python capturescope.py negativeclock --timestamp
Writing screen capture to:  negativeclock.20250203221010.png

c:\Docs\gitwork\labutils>python capturescope.py positiveclock --timestamp
Writing screen capture to:  positiveclock.20250203221056.png

c:\Docs\gitwork\labutils>dir
 Volume in drive C is OS
 Volume Serial Number is 5841-F07E

 Directory of c:\Docs\gitwork\labutils

02/03/2025  10:15 PM    <DIR>          .
02/03/2025  10:15 PM    <DIR>          ..
02/03/2025  09:55 PM             1,327 capturescope.py
02/03/2025  10:10 PM            74,996 negativeclock.20250203221010.png
02/03/2025  10:10 PM            76,419 positiveclock.20250203221056.png
02/03/2025  09:57 PM           103,686 scopeaddress.20250203215700.png
               4 File(s)        256,428 bytes
               2 Dir(s)  500,848,386,048 bytes free

c:\Docs\gitwork\labutils>

Captures via Rigol Web Control

When connected to an IP network, most (all?) Rigol scopes expose a web server on the scope's IP address without SSL encryption or login protection that allow any function that can be performed using the touch screen on the scope to be performed via click in a browser window. For a scope assigned 192.168.99.29 as its IP, surfing to http://192.168.99.29 will display this screen.

Clicking on the Web Control button will pop open a new browser window with a full window matcing the scope's live touchscreen like this. NOTE: It is worth mentioning that this browser view is IDENTICAL in functionality to the touch screen on the scope itself. Any action that can be performed by touching the screen on the scope can be performed by clicking on the same spot on this browser view. HANDY.

It is certainly possible to capture the scope screen using the PC's "screen scraping" utilities (like Window-Shift-S in Windows) to capture the image from this browser view.

It is also possible to use the Print Screen button which displays a different browser page allowing a choice between a static snapshot and a recording.

If the Take Screenshot button is clicked, a screenshot will be captured and rendered in that browser window. At that point, you can right-click on it, choose Save Image As... then write the file wherever desired on the local PC. If the goal is to capture a live change in a signal, the Record Screen button allows control over the start and stop then prompts for the filename and destination to save the *.mp4 video file to the browser PC.