Terraform Codebase Analysis and Refactoring
Below a detailed analysis is provided comparing the original resource-driven organization of the Terraform code with a concept-driven restructuring along with suggestions as to how we might refactor it.
Project structure analysis provided by reviewing the original Terraform vs. a concept-driven structure
Original Structure
The Terraform design for an IBM Cloud implementation of IBM Spectrum Symphony is publicly available and is organized around technical resources with a provider-first approach, making it easier to work with in terms of implementing on IBM Cloud. And, it works very well as-is. However, for those unfamiliar with cloud implementations, the existing Terraform design can be a challenge to understand and work with if you’re wading into this code for the first time.
So, in the current implementation, the root directory contains top-level Terraform files (vpc.tf,variables.tf,outputs.tf), the directory is further organized as follows, and each component of the implementation is divided by the technical resources involved in line with IBM Cloud requirements. This is conceptual to a degree, but doesn’t reveal the business purpose as much.
/resources/ibmcloud/ -- provider-specific modules categorized by resource type:
/compute/ -- Various compute resources (primary_vsi, secondary_vsi, etc.)
/network/ -- Network resources (vpc, subnet, floating_ip, etc.)
/security/ -- Security resources (security_group, security_group_rule, etc.)
/resources/scale_common/ -- Common modules for Spectrum Scale
/resources/windows/ --Windows-specific modules
/scripts/ --User data scripts and templates
Concept-Driven Structure
By way of contrast, a more deliberate concept-driven structure is organized around business concepts with a domain-first approach. What we’re really doing here with this Terraform code is building a high-performance compute cluster on IBM Cloud so banks and investment firms can run highly efficient risk modeling and financial analysis. For those who don’t know,IBM Spectrum Symphony delivers enterprise-class management for running compute and data-intensive distributed applications on a scalable, shared grid. This enables a dynamic compute platform where high throughput tasks run across the grid in an orchestrated fashion as efficiently as possible.
It’s also important to understand that this implementation code is part of a larger family of Spectrum products where commonality in Terraform design is engaged. This article serves as a potential path forward that could extend to other repositories for use with IBM Cloud much the same wayPeter Wilczynski‘s paper describes how concept-driven development was able to deal with improving the complexity and challenges inherent to Palantir Technologies‘ extensive codebase for their own products. So, as we look at the refactored Terraform code provided by Claude 3.7 Sonnet in reference to IBM Spectrum Symphony, we’ll see that it is similar to the original design but also contains important context-driven differences as the directory structure shows:
/concepts/ contains business domain components:
/cluster/ - Primary, secondary, management, and worker node concepts
/storage/ - NFS and Spectrum Scale storage concepts
/networking/ - VPC, DNS, and security concepts
/access/ - Bastion host and authentication concepts
/workload/ - Scheduler and auto-scaling concepts
/infrastructure/ contains technical implementations:
/ibmcloud/ - IBM Cloud provider-specific implementations
/scripts/ - Implementation scripts
common/ - Common utilities and helpers
Key Organization Principles
Original Approach:
- Provider-Centric: IBM Cloud-specific implementations at the forefront
- Resource Type Organization: Grouped by technical resource categories (compute, network, security)
- Implementation-First Design: Exposes technical details at the top level
- Tightly Coupled Modules: Direct references between modules creating high coupling
- Technical Naming: Variables and parameters use technical terminology
Concept-Driven Approach:
- Business Domain Organization: Grouped by business concepts (cluster, storage, networking)
- Provider Abstraction: Technical details isolated in infrastructure layer
- Business-First Design: Exposes business concepts at the top level
- Interface-Based Communication: Well-defined interfaces between concepts with loose coupling
- Business Terminology: Variables and parameters use business language
Main Modules and Their Purposes
Original Structure:
- Compute Modules: Implement specific VM types (primary, secondary, management, worker nodes)
- Network Modules: Implement network resources (VPC, subnets, DNS, floating IPs)
- Security Modules: Implement security resources (security groups, rules, VPN)
- Storage Modules: Implement storage solutions (NFS, Spectrum Scale)
- Scale Common Modules: Implement Spectrum Scale integration
Concept-Driven Structure:
- Cluster Concept: Coordinates all cluster nodes (primary, secondary, management, worker)
Technical Resources: Primary, secondary, management, and worker VMs
Purpose: Provides computational capacity and orchestration
- Storage Concept: Manages storage options (NFS, Spectrum Scale)
Technical Resources: NFS or Spectrum Scale storage
Purpose: Enables data persistence and sharing
- Networking Concept: Handles networking capabilities (VPC, DNS, security)
Technical Resources: VPC, subnets, DNS, security groups
Purpose: Delivers connectivity and isolation
- Access Concept: Manages access control (bastion, authentication)
Technical Resources: Bastion host, SSH keys
Purpose: Facilitates secure entry and authentication
- Workload Concept: Handles workload management (scheduler, scaling)
Technical Resources: Symphony host factory, auto-scaling
Purpose: Manages job scheduling and dynamic capacity
Concept Representation
In the concept-driven approach, these technical resources are represented through business concepts:
Cluster Concept:
- Represents the HPC cluster as a whole
- Encapsulates primary, secondary, management, and worker nodes
- Defines cluster-wide properties like clustering software and orchestration
Storage Concept:
- Abstracts storage capabilities independent of implementation
- Provides interfaces for both NFS and Spectrum Scale options
- Defines business-relevant properties like performance profiles and tiers
Networking Concept:
- Provides connectivity capabilities
- Manages domain name resolution
- Handles security boundaries
- Defines interfaces for other concepts to use
Access Concept:
- Manages how users access the cluster
- Handles authentication and security
- Provides secure entry points
Workload Concept:
- Manages how work is scheduled and distributed
- Handles auto-scaling capabilities
- Defines workload profiles and requirements
Interface Design
Module Communication Comparison
Original Approach:
- Direct Resource References: Modules directly reference attributes of other resources
- Implicit Dependencies: Dependencies inferred from resource references
- Tight Coupling: Changes in one module often require changes in many others
- Global Variables: Heavy use of global variables in root module
- Local Value Sharing: Extensive use of locals for sharing values between resources
Example from original structure:
module "primary_vsi" {
source = "./resources/ibmcloud/compute/primary_vsi"
# Direct references to other module outputs
image = local.image_mapping_entry_found ? local.new_image_id : data.ibm_is_image.image[0].id
profile = data.ibm_is_instance_profile.management_node.name
vpc = data.ibm_is_vpc.vpc.id
subnet_id = module.subnet.subnet_id
security_group = [module.sg.sg_id]
# ...
}
Concept-Driven Approach:
- Interface Objects: Modules communicate through well-defined interface objects
- Explicit Contracts: Clear declarations of required inputs and outputs
- Loose Coupling: Modules depend on interfaces, not implementations
- Domain-Specific Variables: Variables defined in business terms
- Clean Boundaries: Responsibility divided along business concept lines
Example from concept-driven structure:
module "primary" {
source = "./primary"
# Business concept parameters
cluster_name = var.cluster_name
workload_profile = var.workload_profile
# Interface-based dependencies
network_interface = var.network_interface
storage_interface = var.storage_interface
# ...
}
Interface Design Patterns
The concept-driven structure uses several interface design patterns:
Interface Objects: Using Terraform objects to define clear interfaces between concepts:
variable "network_interface" {
type = object ({
subnet_id = string
security_groups = list(string)
dns_domain = string
})
}
Hierarchical Composition: Concepts build upon each other in a hierarchical manner:
- The cluster concept uses the networking concept’s interface
- The storage concept uses the networking concept’s interface
- Lower-level concepts don’t reference higher-level ones
Business Terminology: Interfaces use business terminology rather than technical terms:
variable "workload_profile" { # Business term
type = string
description = "Computational profile for workloads"
}
Instead of:
variable "worker_node_instance_type" { # Technical term
type = string
description = "Instance type for worker nodes"
}
Dependency Management
Original Approach:
- Resource-Based Dependencies: Dependencies managed through direct resource references
- Implicit Dependency Chains: Long chains of dependent resources
- Terraform-Managed Dependencies: Relying on Terraform to track dependencies
- depends_on Usage: Extensive use of depends_on to enforce order
Concept-Driven Approach:
- Interface-Based Dependencies: Dependencies managed through well-defined interfaces
- Explicit Interface Contracts: Clear declaration of what one concept needs from another
- Concept-Level Abstraction: Dependencies expressed at the concept level, not resource level
- Reduced Dependency Chains: Shorter, cleaner dependency chains
Variable Transformation
Technical to Business-Oriented Variables
Original Technical Variables:
variable "worker_node_instance_type" {
type = string
default = "bx2-4x16"
description = "Specify the virtual server instance or bare metal server profile type name to be used to create the worker nodes..."
}
variable "worker_node_min_count" {
type = number
default = 0
description = "The minimum number of virtual server instance or bare metal server worker nodes that will be provisioned at the time the cluster is created..."
}
variable "scale_storage_node_instance_type" {
type = string
default = "cx2d-8x16"
description = "Specify the virtual server instance storage profile type name to be used to create the Spectrum Scale storage nodes..."
}
Transformed Business Variables:
variable "workload_profile" {
type = string
default = "standard"
description = "Computational profile for the cluster (standard, compute-intensive, memory-intensive)"
validation {
condition = contains(["standard", "compute-intensive", "memory-intensive"], var.workload_profile)
error_message = "Valid values for workload_profile are: standard, compute-intensive, memory-intensive."
}
}
variable "worker_min_node_count" {
type = number
default = 0
description = "Minimum number of worker nodes to provision in the cluster"
}
variable "storage_profile" {
type = string
default = "standard"
description = "Storage performance profile (standard, high-performance, ultra-high-performance)"
validation {
condition = contains (["standard", "high-performance", "ultra-high-performance"], var.storage_profile)
error_message = "Valid values for storage_profile are: standard, high-performance, ultra-high-performance."
}
}
Transformation Patterns
Abstraction:
Moving from technical details to business concepts
- Original: worker_node_instance_type = “bx2-4×16”
- Transformed: workload_profile = “compute-intensive”
Business Terminology:
Using domain-specific language
- Original: volume_iops, volume_profile
- Transformed: storage_performance, storage_tier
Semantic Grouping:
Grouping variables by business meaning
- Original: Technical grouping (all network variables together)
- Transformed: Semantic grouping (all cluster variables together)
Validation Enrichment:
Adding business-meaningful validations
- Original: Technical validations (size limits, format checks)
- Transformed: Business validations (valid workload profiles, storage tiers)
Interface Objects:
Creating structured interfaces between concepts
- Original: Individual variables passed between modules
- Transformed: Interface objects representing concept boundaries
Benefits of Concept-Driven Approach
Several benefits appear that become clear when examining a concept-driven approach.
Business Alignment:
- Code structure directly mirrors business concepts and capabilities
- Variables and parameters use business terminology
- Non-technical stakeholders can better understand the codebase
- Documentation can focus on business capabilities
Reduced Coupling:
- Clear boundaries between concepts reduce dependencies
- Changes in one concept don’t affect others
- Implementation details can change without affecting concepts
- Easier to maintain and evolve individual components
Improved Maintainability:
- Developers can focus on specific business domains
- Code organization is more intuitive
- Self-documenting structure based on business domains
- Cleaner separation of concerns
Enhanced Testing Capability:
- Concepts can be tested independently
- Mocks can be used at concept boundaries
- Business logic can be tested separately from infrastructure
- Better unit testing possibilities
Flexible Evolution:
- Technical implementations can change without affecting business concepts
- Potential for multi-cloud support with same concept layer
- New technologies can be adopted with minimal concept changes
- Business-driven rather than technology-driven evolution
Challenges in Implementation
Implementing a new refactored codebase can be challenging and there are several things to think about in revising the design that will take time and effort on the part of those developing or maintaining the code. Some of these challenges are listed below.
Additional Abstraction Layer:
- More code to maintain initially
- Indirection between concepts and implementation
- Learning curve for new developers
- Additional complexity in the codebase
Transformation Overhead:
- Business concepts must be mapped to technical resources
- Extra processing to translate between layers
- Potential performance impact during Terraform plan/apply
- Complexity in maintaining the mapping logic
Initial Development Effort:
- Higher upfront investment to design concept structure
- More thought required for interface design
- May slow initial development velocity
- Requires domain expertise to identify the right concepts
Trade-offs
Of course, refactoring code comes with costs and trade-offs will be in play. Businesses should carefully consider whether the switch is a real fit and make sure to weigh the pros/cons of refactoring in light of a concept-driven design.
Simplicity vs. Business Alignment:
- The original structure is simpler but less aligned with business concepts
- The concept-driven structure is more complex but better reflects business domains
Development Speed vs. Maintainability:
- The original structure allows faster initial development
- The concept-driven structure improves long-term maintainability
Technical vs. Business Focus:
- The original structure focuses on technical implementation
- The concept-driven structure focuses on business capabilities
Cloud Provider Coupling vs. Abstraction:
- The original structure is tightly coupled to IBM Cloud
- The concept-driven structure abstracts provider details
Conclusion
The concept-driven approach requires more initial investment but yields significant long-term benefits in maintainability, business alignment, and flexibility. It creates a more resilient, understandable, and evolvable codebase that can adapt to changing business needs and technical requirements.
By separating business concepts from technical implementation, the restructured code is better positioned for future changes, whether they’re driven by business requirements or technological shifts. This approach makes the codebase more accessible to both technical and non-technical stakeholders, creating a shared language between business and development teams.
The above article was written in conjunction with Claude 3.7 Sonnet when I asked it to analyze and refactor the Terraform codebase for implementing IBM Spectrum Symphony on IBM Cloud using Anthropic’s Claude Code command line tool in light of the paper Context-Driven Software Development: An Experience Report by Peter Wilczynski, Taylor Gregoire-Wright, and Daniel Jackson (https://doi.org/10.48550/arXiv.2304.14975).