Context-Driven Terraform for IBM Spectrum Symphony on IBM Cloud

Terraform Codebase Analysis and Refactoring

Below a detailed analysis is provided comparing the original resource-driven organization of the Terraform code with a concept-driven restructuring along with suggestions as to how we might refactor it.

Project structure analysis provided by reviewing the original Terraform vs. a concept-driven structure

Original Structure

The Terraform design for an IBM Cloud implementation of IBM Spectrum Symphony is publicly available and is organized around technical resources with a provider-first approach, making it easier to work with in terms of implementing on IBM Cloud. And, it works very well as-is. However, for those unfamiliar with cloud implementations, the existing Terraform design can be a challenge to understand and work with if you’re wading into this code for the first time.

So, in the current implementation, the root directory contains top-level Terraform files (vpc.tf,variables.tf,outputs.tf), the directory is further organized as follows, and each component of the implementation is divided by the technical resources involved in line with IBM Cloud requirements. This is conceptual to a degree, but doesn’t reveal the business purpose as much.

/resources/ibmcloud/ -- provider-specific modules categorized by resource type:

/compute/ -- Various compute resources (primary_vsi, secondary_vsi, etc.)
/network/ -- Network resources (vpc, subnet, floating_ip, etc.)
/security/ -- Security resources (security_group, security_group_rule, etc.)
/resources/scale_common/ -- Common modules for Spectrum Scale
/resources/windows/ --Windows-specific modules
/scripts/ --User data scripts and templates

Concept-Driven Structure

By way of contrast, a more deliberate concept-driven structure is organized around business concepts with a domain-first approach. What we’re really doing here with this Terraform code is building a high-performance compute cluster on IBM Cloud so banks and investment firms can run highly efficient risk modeling and financial analysis. For those who don’t know,IBM Spectrum Symphony delivers enterprise-class management for running compute and data-intensive distributed applications on a scalable, shared grid. This enables a dynamic compute platform where high throughput tasks run across the grid in an orchestrated fashion as efficiently as possible.

It’s also important to understand that this implementation code is part of a larger family of Spectrum products where commonality in Terraform design is engaged. This article serves as a potential path forward that could extend to other repositories for use with IBM Cloud much the same wayPeter Wilczynski‘s paper describes how concept-driven development was able to deal with improving the complexity and challenges inherent to Palantir Technologies‘ extensive codebase for their own products. So, as we look at the refactored Terraform code provided by Claude 3.7 Sonnet in reference to IBM Spectrum Symphony, we’ll see that it is similar to the original design but also contains important context-driven differences as the directory structure shows:

/concepts/ contains business domain components:

/cluster/ - Primary, secondary, management, and worker node concepts
/storage/ - NFS and Spectrum Scale storage concepts
/networking/ - VPC, DNS, and security concepts
/access/ - Bastion host and authentication concepts
/workload/ - Scheduler and auto-scaling concepts

/infrastructure/ contains technical implementations:

/ibmcloud/ - IBM Cloud provider-specific implementations
/scripts/ - Implementation scripts
common/ - Common utilities and helpers

Key Organization Principles

Original Approach:

Provider-Centric: IBM Cloud-specific implementations at the forefront
Resource Type Organization: Grouped by technical resource categories (compute, network, security)
Implementation-First Design: Exposes technical details at the top level
Tightly Coupled Modules: Direct references between modules creating high coupling
Technical Naming: Variables and parameters use technical terminology

Concept-Driven Approach:

Business Domain Organization: Grouped by business concepts (cluster, storage, networking)
Provider Abstraction: Technical details isolated in infrastructure layer
Business-First Design: Exposes business concepts at the top level
Interface-Based Communication: Well-defined interfaces between concepts with loose coupling
Business Terminology: Variables and parameters use business language

Main Modules and Their Purposes

Original Structure:

Compute Modules: Implement specific VM types (primary, secondary, management, worker nodes)
Network Modules: Implement network resources (VPC, subnets, DNS, floating IPs)
Security Modules: Implement security resources (security groups, rules, VPN)
Storage Modules: Implement storage solutions (NFS, Spectrum Scale)
Scale Common Modules: Implement Spectrum Scale integration

Concept-Driven Structure:

Cluster Concept: Coordinates all cluster nodes (primary, secondary, management, worker)

Technical Resources: Primary, secondary, management, and worker VMs

Purpose: Provides computational capacity and orchestration

Storage Concept: Manages storage options (NFS, Spectrum Scale)

Technical Resources: NFS or Spectrum Scale storage

Purpose: Enables data persistence and sharing

Networking Concept: Handles networking capabilities (VPC, DNS, security)

Technical Resources: VPC, subnets, DNS, security groups

Purpose: Delivers connectivity and isolation

Access Concept: Manages access control (bastion, authentication)

Technical Resources: Bastion host, SSH keys

Purpose: Facilitates secure entry and authentication

Workload Concept: Handles workload management (scheduler, scaling)

Technical Resources: Symphony host factory, auto-scaling

Purpose: Manages job scheduling and dynamic capacity

Concept Representation

In the concept-driven approach, these technical resources are represented through business concepts:

Cluster Concept:

Represents the HPC cluster as a whole
Encapsulates primary, secondary, management, and worker nodes
Defines cluster-wide properties like clustering software and orchestration

Storage Concept:

Abstracts storage capabilities independent of implementation
Provides interfaces for both NFS and Spectrum Scale options
Defines business-relevant properties like performance profiles and tiers

Networking Concept:

Provides connectivity capabilities
Manages domain name resolution
Handles security boundaries
Defines interfaces for other concepts to use

Access Concept:

Manages how users access the cluster
Handles authentication and security
Provides secure entry points

Workload Concept:

Manages how work is scheduled and distributed
Handles auto-scaling capabilities
Defines workload profiles and requirements

Interface Design

Module Communication Comparison

Original Approach:

Direct Resource References: Modules directly reference attributes of other resources
Implicit Dependencies: Dependencies inferred from resource references
Tight Coupling: Changes in one module often require changes in many others
Global Variables: Heavy use of global variables in root module
Local Value Sharing: Extensive use of locals for sharing values between resources

Example from original structure:

module "primary_vsi" {
  source                  = "./resources/ibmcloud/compute/primary_vsi"
  # Direct references to other module outputs
  image                   = local.image_mapping_entry_found ? local.new_image_id : data.ibm_is_image.image[0].id
  profile                   = data.ibm_is_instance_profile.management_node.name
  vpc                        = data.ibm_is_vpc.vpc.id
  subnet_id             = module.subnet.subnet_id
  security_group    = [module.sg.sg_id]
  # ...
}

Concept-Driven Approach:

Interface Objects: Modules communicate through well-defined interface objects
Explicit Contracts: Clear declarations of required inputs and outputs
Loose Coupling: Modules depend on interfaces, not implementations
Domain-Specific Variables: Variables defined in business terms
Clean Boundaries: Responsibility divided along business concept lines

Example from concept-driven structure:

module "primary" {
  source                       = "./primary"
  # Business concept parameters
  cluster_name           = var.cluster_name
  workload_profile     = var.workload_profile
  # Interface-based dependencies
  network_interface   = var.network_interface
  storage_interface    = var.storage_interface
  # ...
}

Interface Design Patterns

The concept-driven structure uses several interface design patterns:

Interface Objects: Using Terraform objects to define clear interfaces between concepts:

variable "network_interface" { 
    type = object ({
       subnet_id            = string 
       security_groups = list(string) 
       dns_domain        = string 
    }) 
}

Hierarchical Composition: Concepts build upon each other in a hierarchical manner:

The cluster concept uses the networking concept’s interface
The storage concept uses the networking concept’s interface
Lower-level concepts don’t reference higher-level ones

Business Terminology: Interfaces use business terminology rather than technical terms:

variable "workload_profile" { # Business term 
   type             = string 
   description = "Computational profile for workloads"
}

Instead of:

variable "worker_node_instance_type" { # Technical term 
    type             = string 
    description = "Instance type for worker nodes" 
}

Dependency Management

Original Approach:

Resource-Based Dependencies: Dependencies managed through direct resource references
Implicit Dependency Chains: Long chains of dependent resources
Terraform-Managed Dependencies: Relying on Terraform to track dependencies
depends_on Usage: Extensive use of depends_on to enforce order

Concept-Driven Approach:

Interface-Based Dependencies: Dependencies managed through well-defined interfaces
Explicit Interface Contracts: Clear declaration of what one concept needs from another
Concept-Level Abstraction: Dependencies expressed at the concept level, not resource level
Reduced Dependency Chains: Shorter, cleaner dependency chains

Variable Transformation

Technical to Business-Oriented Variables

Original Technical Variables:

variable "worker_node_instance_type" {
  type             = string
  default        = "bx2-4x16"
  description = "Specify the virtual server instance or bare metal server profile type name to be used to create the worker nodes..."
}

variable "worker_node_min_count" {
  type             = number
  default        = 0
  description = "The minimum number of virtual server instance or bare metal server worker nodes that will be provisioned at the time the cluster is created..."
}

variable "scale_storage_node_instance_type" {
  type             = string
  default        = "cx2d-8x16"
  description = "Specify the virtual server instance storage profile type name to be used to create the Spectrum Scale storage nodes..."
}

Transformed Business Variables:

variable "workload_profile" {
  type             = string
  default        = "standard"
  description = "Computational profile for the cluster (standard, compute-intensive, memory-intensive)"
  
  validation {
    condition           = contains(["standard", "compute-intensive", "memory-intensive"], var.workload_profile)
    error_message = "Valid values for workload_profile are: standard, compute-intensive, memory-intensive."
  }
}

variable "worker_min_node_count" {
  type              = number
  default         = 0
  description = "Minimum number of worker nodes to provision in the cluster"
}

variable "storage_profile" {
  type             = string
  default        = "standard"
  description = "Storage performance profile (standard, high-performance, ultra-high-performance)"
  
  validation {
    condition           = contains (["standard", "high-performance", "ultra-high-performance"], var.storage_profile)
    error_message = "Valid values for storage_profile are: standard, high-performance, ultra-high-performance."
  }
}

Transformation Patterns

Abstraction:

Moving from technical details to business concepts

Original: worker_node_instance_type = “bx2-4×16”
Transformed: workload_profile = “compute-intensive”

Business Terminology:

Using domain-specific language

Original: volume_iops, volume_profile
Transformed: storage_performance, storage_tier

Semantic Grouping:

Grouping variables by business meaning

Original: Technical grouping (all network variables together)
Transformed: Semantic grouping (all cluster variables together)

Validation Enrichment:

Adding business-meaningful validations

Original: Technical validations (size limits, format checks)
Transformed: Business validations (valid workload profiles, storage tiers)

Interface Objects:

Creating structured interfaces between concepts

Original: Individual variables passed between modules
Transformed: Interface objects representing concept boundaries

Benefits of Concept-Driven Approach

Several benefits appear that become clear when examining a concept-driven approach.

Business Alignment:

Code structure directly mirrors business concepts and capabilities
Variables and parameters use business terminology
Non-technical stakeholders can better understand the codebase
Documentation can focus on business capabilities

Reduced Coupling:

Clear boundaries between concepts reduce dependencies
Changes in one concept don’t affect others
Implementation details can change without affecting concepts
Easier to maintain and evolve individual components

Improved Maintainability:

Developers can focus on specific business domains
Code organization is more intuitive
Self-documenting structure based on business domains
Cleaner separation of concerns

Enhanced Testing Capability:

Concepts can be tested independently
Mocks can be used at concept boundaries
Business logic can be tested separately from infrastructure
Better unit testing possibilities

Flexible Evolution:

Technical implementations can change without affecting business concepts
Potential for multi-cloud support with same concept layer
New technologies can be adopted with minimal concept changes
Business-driven rather than technology-driven evolution

Challenges in Implementation

Implementing a new refactored codebase can be challenging and there are several things to think about in revising the design that will take time and effort on the part of those developing or maintaining the code. Some of these challenges are listed below.

Additional Abstraction Layer:

More code to maintain initially
Indirection between concepts and implementation
Learning curve for new developers
Additional complexity in the codebase

Transformation Overhead:

Business concepts must be mapped to technical resources
Extra processing to translate between layers
Potential performance impact during Terraform plan/apply
Complexity in maintaining the mapping logic

Initial Development Effort:

Higher upfront investment to design concept structure
More thought required for interface design
May slow initial development velocity
Requires domain expertise to identify the right concepts

Trade-offs

Of course, refactoring code comes with costs and trade-offs will be in play. Businesses should carefully consider whether the switch is a real fit and make sure to weigh the pros/cons of refactoring in light of a concept-driven design.

Simplicity vs. Business Alignment:

The original structure is simpler but less aligned with business concepts
The concept-driven structure is more complex but better reflects business domains

Development Speed vs. Maintainability:

The original structure allows faster initial development
The concept-driven structure improves long-term maintainability

Technical vs. Business Focus:

The original structure focuses on technical implementation
The concept-driven structure focuses on business capabilities

Cloud Provider Coupling vs. Abstraction:

The original structure is tightly coupled to IBM Cloud
The concept-driven structure abstracts provider details

Conclusion

The concept-driven approach requires more initial investment but yields significant long-term benefits in maintainability, business alignment, and flexibility. It creates a more resilient, understandable, and evolvable codebase that can adapt to changing business needs and technical requirements.

By separating business concepts from technical implementation, the restructured code is better positioned for future changes, whether they’re driven by business requirements or technological shifts. This approach makes the codebase more accessible to both technical and non-technical stakeholders, creating a shared language between business and development teams.

The above article was written in conjunction with Claude 3.7 Sonnet when I asked it to analyze and refactor the Terraform codebase for implementing IBM Spectrum Symphony on IBM Cloud using Anthropic’s Claude Code command line tool in light of the paper Context-Driven Software Development: An Experience Report by Peter Wilczynski, Taylor Gregoire-Wright, and Daniel Jackson (https://doi.org/10.48550/arXiv.2304.14975).

Originally posted on LinkedIn