laitimes

Software Development Complexity Solutions: An Overview of Emergence and Scaffolding Methods

Author | Greg Brougham

Translated by | Wang Qiang

Planning | Ding Xiaoyun

This article outlines a new approach to scaffolding and emergence-based system development. We used three key principles, but the underlying changes we made were based on taking an approach called emergence and introducing the concept of scaffolding:

We focus on business needs and no longer have Product Owners because they are usually just agents on the business side. We acknowledge that customer needs often come to light and are difficult for people to understand, so we prefer to reach a general consensus and jointly explore what to do next to make sense.

We move from one stable state to the next. In doing so, we make a series of commitments to move in the desired direction, and we also have the ability to adjust or stop the pace if the results are not ideal. These steady states allow us to "pick up the pace" and deliver the next steady state with business value.

We want to leverage scaffolding to accelerate the learning and delivery process. In addition to supporting the delivery process, we also managed to build a knowledge framework because it was also helpful to do so. The focus is not only on technology, but also on knowledge.

Embracing the emergence approach allows us to explore potential needs in a collaborative way. As mentioned earlier, we are not going to set up privileged "Product Owners" here, but rather try to deal directly with business needs and explore their needs from the perspective of the business side. Business value can be direct or indirect. Directly means that it is part of a sale or service that charges the customer directly. Indirectly, the business value may be reflected in accelerating the delivery or support of maintenance work provided by the company. Communicating directly with business parties can help us explore specific types of business value and business benefits.

Using scaffolding not only allows us to deliver faster, it also allows us to delay the development of explicit knowledge of the problem domain. This "knowledge" scaffolding proved to be very useful, giving us time to dive into the underlying technologies and communication protocols that are relevant to the problem domains of the example use cases.

We believe these practices are also useful and valuable for other development projects. We see aspects of this way of working as complementary to other approaches, such as continuous architecture.

Floating

People are not very familiar with the concept of emergence, but on a basic level, we need to admit that we didn't know everything in the first place. It is only through interaction and exploration that we can understand what is useful, which is what we call emergence. While needs may surface, we tend to have some ideas and judgments about what we want and/or the direction we want to go, which can be used to set guiding constraints.

If you compare it to the traditional approach of defining a set of requirements in the first place: the requirements defined by the latter are often based on a series of unsubstantiated assumptions and may therefore be incorrect or incomplete. Indeed, before we can start writing code, we do need to know what we are building. Therefore, we need to be able to bridge these gaps while still allowing customers to emerge with previously unclear requirements during development. These requirements can only be clarified through the actual use of the system.

We need to have a "complete" concept before we start the development work, and its specific content needs to be analyzed on a case-by-case basis. There's an old saying that with a clear definition, the problem is half solved. Lean has a "complete" toolkit philosophy that says not to start a task until all relevant information and materials are available.

Implicit knowledge can be acquired naturally, while explicit knowledge needs to be implicitly understood and applied. Therefore, all knowledge is either implicit or rooted in implicit knowledge. Fully explicit knowledge does not exist.

We can also look at this from a needs perspective. I think there are three types of knowledge that can be mapped to what Cynefin calls clear, complex, and very complex domains. They are:

Known knowledge, or known knowledge that someone understands

What's not clear, or what's known and not known

Unknowns that have not yet been discovered

Some would argue that known knowledge should be easy to handle and easy to explain, but caution is required even here. I remember one time I was working on a new card settlement system where we needed to deal with blacklisted cards. Our assumption is that a card is either blacklisted or not on the blacklist. But we were told that the current system might return "yes," "no," or "maybe," and no one could explain how the last one came about. We mistakenly assume that the problem is very clear and straightforward, but it is actually a complex problem that is time-consuming and expensive to solve.

We have a lot of experience in addressing the second need — something that no one already knows about — and you could argue that a lot of agile practices can help express those needs, and that practices (such as innovative games) are useful here as well. This is generally the case, and iterative development is also useful because it allows us to articulate these elements clearly and fuse them together.

The challenge arises when dealing with unclear needs or unknowns that no one has yet discovered. If we don't know that something is unknown, then we need a way to deal with their emergence; it's a problem that traditional development methods struggle to deal with. We also need to acknowledge that implementation needs to have clear requirements, because we need to have an outline of a method or solution in order to write code. Here, we see the value of Cynefin's critical state. In this state, we are open to a variety of options, exploring what is useful, and only after we have reached a consensus to put into development work. In practice, we've found that customers tend to be clear from their point of view about what they're asking for, but we need to be clear about something to confirm the delivery and reach a consensus before making a commitment. This requires not only a confirmation of requirements, but also a clear definition of the requirements that support the definition of "done".

Related to this, we have a curiosity about the interface, but we don't focus on the interface, because our focus is on the functional area, which is also the focus of the business. Scaffolding may have some use here, and it can provide predetermined natural boundaries that need to be observed. Functional areas may become problem domains, but this is not predetermined.

From a delivery perspective, we're involved in a lot of small chunks of work that help us explore the architecture of the system and the features that need to be supported. These are all value steps, and we also use epic because the agile community understands these concepts and provides the right level of granularity. These jobs consist of a lot of suitable stories and missions. One story might be to look at a specific interface to support the required functionality, and we'll also acknowledge that both may evolve over time, and we can barely predict where they'll go. An example of an epic might be the monitoring of a system, which is not necessarily about the functional aspect, but may also contain non-functional elements.

At the story level, we may explore a lot of options that offer certain features or capabilities. We want to explore it in a low-cost and fast way to realize value. We're not going to load it into the feature itself, because we want to fully understand how to support and deliver it before committing to adding it to the epic. In this way, the system architecture will continue to explore and improve at every step, and each step is a "stable state" that can enhance the value.

This way we don't need to plan in detail upfront, while being able to focus on what's valuable to the business and be open to needs. We don't maintain a large backlog, which also helps prioritize, make the business truly agile, and take responsibility for unanticipated customer needs.

scaffold

To support the use of the floating method, we also use scaffolding, which helps guide the system and provides initial stability. There are many different types of scaffolding available, but at a high level, they can be internal or external, temporary or permanent. Internal scaffolding usually provides a structure for you to build on. By definition, scaffolding may become part of the structure and is therefore inherently permanent. External scaffolding tends to constrain the system but supports the system's movement to the next steady state. But because it is external, it is usually temporary. In the long run, it tends to become redundant because the internal structure can support itself on its own.

People often think of scaffolding as just something that makes you put off work, but what is often overlooked is that scaffolding allows us to postpone the knowledge exploration process because it already contains the "coding" of domain knowledge. It's not just about the knowledge of how we build, but also the direct business knowledge that we don't need to spend any more time acquiring them. If this is the case, then we can use it to deliver business value as early as possible without having to know the business domain in advance (we're postponing the process and allowing time to develop implicit knowledge).

We can also use scaffolding to bind or bridge the individual elements of the system. Here we can use some ready-made tools that are able to exchange information in a suitable canonical structure, since they are more efficient than some common formats. These will become part of the system and will persist for a long time.

In addition, scaffolding allows the integration of core elements early in the lead time of the lead time, thus supporting the process of exploring any assumptions that have been made. Combined with a backbone-based development process, this means that every change is implicitly evaluated in a holistic manner, so we don't need explicit integration testing.

The scaffolding here is a collection of many open source implementations that are chosen based on a simple functional decomposition of advanced business requirements and an appropriate level of abstraction in the underlying technical domain. As mentioned earlier, this means that we do not need a detailed or extensive analysis of business needs, and often can get enough information from a short conversation with the business side.

Once we have a basic understanding of our business needs, we can look at those options that support the bootstrap system. This may also require multiple components; we may find some code that addresses specific business needs, but may still require a way to store and manage the data.

The key point we have in choosing the option (regardless of language preference here) is the appropriate level of abstraction for the problem domain, which goes back to scaffolding knowledge. We need simple and intuitive interfaces to avoid the need to develop tacit domain knowledge, unless we already have that need. Where we are familiar with a particular feature, such as data representation, we can use it to guide the initial selection process of a technology or codebase.

Case study – KNX monitoring

This example of a case requires the development of a KNX monitoring system that can be used to present and analyze all devices deployed in the installation. KNX is a binary protocol that defines ISO standards that support the automation and monitoring of all elements of the home and office. In this case, we have a basic functional decomposition framework that allows us to understand where we want to go, but there is a lack of knowledge in various areas. The general direction set is that we need to collect and present metrics from the deployment, then sound the alarm, and finally form a loop so that the indicator can trigger action. We also need to support massive scaling, as there can be a large number of deployments.

These features are all specific to this scenario, so in other cases you'll need to go through a similar process. It is worth noting that these functional areas are summarized from the discussion and are identified and emerged over time. We did not intend to conduct a long-term or detailed analysis, but instead started moving forward as soon as we identified the initial characteristics and the first value step. These initial features include collection and demonstration because they provide the required visibility and cover most technical issues such as connectivity, collection and demonstration of device events. The key competencies built over time are summarized below:

Collections – Based on the integration of threads and KNXETS files in multiple locations, so device types are known and can be mapped.

Demo - Visualization of events, metrics, and status of the devices and groups that make up the installation.

Processing – Analyze events and alert on conditions.

Agent – Sends alerts to support actions and automates actions in the system.

Basic scaffolding support is done by leveraging an open source implementation for KNXDPT (Data Point Type) elements and major data storage and processing processes. If there is a business or performance problem, these elements can be replaced later. We've also adopted an "event sourcing" approach to ensure that all events are captured. This not only supports observability, but also ensures that we can recreate any specific view we need, thus postponing specific design decisions.

On the open source side, we're primarily interested in the MIT license because it provides flexibility. But we're also willing to leverage Apache licensing to isolate parts that are unlikely to require changes. The idea here is to build the system architecture in a short period of time while reducing the cost of learning about the complexity of KNX and its device and group address structure. We complement this with the Influx TICK stack for data collection and processing. Over time, we can switch between elements of the stack or recode them as they become constraints, but the core stack can handle thousands of events per second, and we see a typical KNX deployment processing 800-1000 events per hour. This means that the current rate of a location is on average less than one event per second.

As mentioned above, instead of pre-defining a set of requirements at the outset, we explore them in a collaborative way that allows for learning. We built WoE (Engagement Mode) based on epics that are short-term work (days to a week) to deliver some new features. These features reflect the customer's needs, which means that we can consider new requirements or allow customers to experience the system before the next step in development. This is iterative in nature, meaning that development efforts are always focused on what is needed. It also allows us to learn about the stacks and libraries we are using, which is currently very effective.

These practices do not have to be purely functional, as an example is the monitoring and recovery of installed connections. The basic epic covers the recovery of a faulty connection, but we added a story of a hung connection to address the state we observed. Solving this problem means that the system is largely self-healing and largely runs automatically.

We've also simplified our code and testing approach, using the master/trunk (now often called "master") branch development model, so when we do feature development, we commit and test all the code every day. We can use feature switching to limit the use of a feature if needed, but these are usually additional requirements. We use feature switching to support the migration process from InfluxDB V1 to V2.

Summary

We believe that the practices we have explored to support emergence and scaffolding have universal applicability to the system delivery process. Practices around the surface avoid the need for formal requirements and provide an easy way to work and interact with customers. Using scaffolding to address technical requirements through an open source approach meant that we didn't initially need to know more about KNX technology, which translated directly into early business gains. This means that when business needs become clearer, we can both save time and respond to business needs.

Reference

Cynefin Framework (https://cynefin.io/wiki/Cynefin)

Cynefin type of scaffolding (http://%20https//cynefin.io/wiki/Scaffolding)

About the Author:

Greg Brougham is an experienced developer, architect, and technology leader. In recent years, he has served as the engineering director of a blockchain startup and defined the architecture for a telecommunications company's digital transformation initiatives. He also wrote a small book on the Cynefin complexity framework, and he still enjoys writing code when time permits.

https://www.infoq.com/articles/emergence-scafolding-complexity/

Read on