Treating All of Your Code as a Valuable IP Asset

2021-10-21 - 8 min read
Daniel Young
Daniel Young
Founder, DRYCodeWorks

The tools that build, test, version, release, and configure hosts for your application code are technical assets of your business. In the same way an auto shop owns mechanics tools, car lifts, and garage bays, a software company often owns intellectual property which defines things like cloud infrastructure, data schemas, and system designs. The artifacts that encode these IP assets are often a combination of code, documentation, and institutional knowledge.

Bringing forward as much of this IP as possible and preserving it in archivable formats is essential for the long-term health of a software company. While institutional knowledge will always be an invaluable resource, it is also a dangerous liability if not routinely transcribed. There are also cognitive thresholds to knowledge that don't exist when those assets are transferred to other mediums.

At DRYCodeWorks, we often see three key areas that when consistently transcribed from institutional knowledge into more appropriate forms of record, companies can benefit greatly from.

What is it?

Infrastructure as Code (IaC) - a declarative syntax for describing a software system. Ie: server_a - 64GB RAM - 8CPU - us-east-1 etc.

IAC provides a well documented, version controlled way to manage your infrastructure using code. This means that you can define your infrastructure in a text file, and then use a tool to create and manage your infrastructure resources based on that defined schema.

Infrastructure as code has been such a transformative tool since it's origin with the inception of Puppet in 20051 and it's many successors (Terraform, CloudFormation, etc.), it has eliminated the need for costly server management teams and put the power of the virtual cloud squarely in the hands of the engineers who need it.

Why is it Important?

First and foremost, IAC acts as fantastic documentation for non-trivial infrastructures. It's also a powerful tool for orchestrating complex, interdependent changes across multiple parts of your infrastructure.

High quality IAC tools allow you to:

  • Show the entire configurable surface of a cloud resource
  • Allow you to represent dependencies between resources directly as code
  • Easily deploy the same infrastructure in multiple isolated environments
  • Easily scale expensive resources in and out to meet demand.

With the power of modern IAC tooling, engineers can spend more time creating and obsessing over customers and less time pushing the proverbial boulder up the hill.

Problems at companies without it

Cloud infrastructure can be brutally complicated to manage. It's difficult to document, the dependencies are often hard to understand, security surfaces are often opaque, and most cloud providers have yet to provide a console experience that allows for idempotent, multi-resource changes. The ability to change more than one part of a cloud orchestration in one transaction is an ever-growing need, and IAC provides us with a path forward.

Companies that don't have IAC to orchestrate their infrastructure often:

  • Invest lots of time in the deployment of new servers / infrastructure to handle new customer demand
  • Experience consistent, difficult to track, ephemeral bugs occurring due to unstable deployment processes
  • Find iterating on infrastructure to be an intractable problem
  • Rely wholly on institutional knowledge to manage infrastructure
  • Struggle profoundly with disaster recovery

Why is IAC a valuable IP Asset?

IAC is one of the most empowering tools for modern software development. It can be used as a configuration management tool, and when integrated into CI/CD , can provide unbelievable synergy between infrastructure and application code.

While empowering software engineers does tend to lessen attrition, companies with IAC can sleep easier knowing that if their engineers move on from their company, the institutional knowledge they possess is safely encoded alongside their application code.

Define and Version Database Schema

What is it?

Database Schema as Code- the practice of managing database schemas in a text-based format, using a programming language such as SQL, YAML, or JSON.

One of the most obvious technical assets of a software company is its data. What's less obvious is that structured data is almost always of significantly more value than unstructured data.

Consider Facebook: the near-trillion dollar business was enriched not on the back of everyone's personal information, but on the powerful relationships the company drew from that information.

Why is it Important?

Database Schemas allow us to track the current, past, and proposed versions of a data model. They allow us to quickly develop features which the current model doesn't yet support. They let us increment and decrement the model to roll features forward and back. There's much overlap here with the benefits of both IAC and CI/CD.

When properly implemented, Database Schema as Code:

  • Makes it easier for developers to collaborate on database changes and communicate those changes to other stakeholders.
  • Makes it easier to deploy database changes quickly and safely, both forward and backward.
  • Helps to reduce the risk of errors in database changes. The risk in database changes can be very high
  • Provides a framework for idempotent and reversible change management, allowing us to iterate on the database in a safe and reliable way

Problems at companies without it

Database management has been an area of development for nearly 50 years, yet, many companies still fail to utilize the amazing tools that have come to dominate the space. Schema changes still require team meetings, shelling into servers, running often tremendously risky SQL statements on the production server with no rollback plan. These strategies require patience, extreme diligence, and frankly, a strong stomach. At DRYCodeWorks, we prefer a more predictable approach, using tools like Flywheel, Alembic, and Liquibase.

Companies that don't use Database Schema management tools often have:

  • Difficulty tracking database changes, leading to errors and inconsistencies in the database.
  • A lower likelihood of "feature flagging" based on schema versioning
  • An increased risk of database outages
  • Reduced agility
  • Compliance challenges as some regulatory bodies require database changes to be documented and auditable

Why are Database Schemas a valuable IP Asset?

Knowing the lifecycle of your data-model is valuable. Having easily surface-able documentation of the state of your data makes it easier to collaborate and scale. If database changes are made ad-hoc, recovering the institutional knowledge lost when the responsible engineer leaves can be excruciating. It's better to have database schemas live where they belong: alongside your application code.

Distributed Systems Designs

Replication, Resiliency, and Reliability
Most complex software applications being built today consist of multiple, interconnected computers collaborating to solve complex business needs. The study of these "distributed systems" is a dense point of ongoing research among computer scientists, and proficiency in this domain is the cornerstone of software engineering at all of the major technology companies.

Building and working in a distributed system has its advantages. It increases the flexibility of a system, provides opportunities for redundancies at different parts of the system, and allows developers to leverage product offerings from platform providers who have solved some of the hardest known problems in computer science.

The applications we can build today with the tools available in the cloud dwarf anything we could have built before, but with that comes a different kind of complexity, and more need for documentation, infrastructure as code, and design documents to express these systems.

What is it?

Distributed System- a collection of computer programs that utilize computational resources across multiple, separate computation nodes to achieve a common, shared goal

Distributed systems may consist of multiple, separate micro services that share a database or multiple databases; they may have one or many event driven systems, which respond to incoming messages and process those messages in one or many ways, once or many times; they may have big "warehouses" of unstructured data; relationships between networks; the list is endless.

This complexity comes with a lot of advantages in a well build distributed system:

  • Most infrastructure is elastically scalable
  • Single points of failure are eliminated, leading to 100% uptime
  • Levers can be pulled to throttle compute resources, like the gas peddle in a car
  • Resources are largely decoupled by design, creating ease when swapping out individual components

Communicating these systems to other developers is tricky. At DRYCodeWorks, we tend to use a combination of engineering diagrams, design documents (for which you can find a template here), and Infrastructure as Code.

Why is it important?

While not every system needs to be build this way, the vast majority of modern software applications have some amount of distribution in their schema.

This

  • Protects us from data loss during downtime
  • Allows us to perform Database administration without data loss during the maintenance window
  • Can offer guaranteed processing of writes
  • Elasticity: Allows us to scale and modify different pieces of the system without modifying those that don't need to be modified
  • Can respond to spikey traffic
  • Serverless can work well with this
  • Data loss can kill a company
  • Applications that aren't replicated can't scale horizontally
  • Databases that aren't replicated can't scale horizontally

Problems of companies that lack System Design Documentation

At many companies we've worked with, we've seen an apparent lack of documentation around the shape of the system. This tends to happen because as the system grows organically in the cloud console, developers add resources, configure networks, setup permissions etc. and neglect to encode their architectural decisions. This creates a knowledge bottleneck for new developers trying to engage with the system at large, and leads to complexity / duplicated work.

At companies that lack sufficient design documentation, we see:

  • Intermittent data loss
  • Difficulty scaling runtime resources
  • Individual bugs in one part of the system leading to catastrophic system failure (due to resource coupling)
  • Duplication of work, leading to costly compute bills and building increasingly expensive technical debt
  • A lack of cross-region replication leading to availability problems
  • A lack of database replication, making the database a catastrophic point of failure in the event of an outage, and compromising availability and data consistency in high traffic situations

To reduce the problem down to its simplest explanation, complexity in a system can be good (it gives you a lot of useful paradigms to work in that can solve equally complicated problems), but only if that complexity is wielded knowingly and responsibly. When systems meet a threshold of complexity, they are too much for any individual to manage in their entirety, and overlooking this liability can have tremendous consequences.

Why are Distributed System Designs a valuable IP Asset?

When you spend the time to document your system designs, you empower engineers to piggyback off the work of their predecessors. This creates opportunities to reduce cost (both by saving engineering time, and reducing the likelihood of duplicated resources), harden systems (by surfacing problems that may otherwise go unnoticed), and build confidence in the integrity of your application.

Final Thoughts

AWS Solutions Architects / Former AWS Employees
IaC is a powerful tool that can help small startups to manage their infrastructure more effectively. However, it is important to note that IaC can be complex to implement, and it is important to have a good understanding of your infrastructure before you start using IaC. As such, it's usually best to start with a comprehensive audit of your system design.

Your technical intellectual property is one of the most valuable assets your company owns, and ensuring that IP is intelligible and secure can be a powerful investment to bring to potential investors.

At DRYCodeWorks, we specialize in creating many of these IP assets. As certified AWS SAs, and former AWS employees, we have a deep knowledge not only of distributed systems engineering at large, but of the tools available in the AWS ecosystem. Set up a call with us today to find out how we can help you ensure the longevity of your business by transcribing your intellectual property.