The cloud has scrambled the context defenders are accustomed to leaning on for understanding the attack surface. No longer do attackers move along a linear network-plane, from one asset to another where visibility can be traced at a predictable layer in the network stack. In the cloud, every move an attacker can make needs to be understood in relationship to the cloud infrastructure they are operating on.
In this post I aim to clarify the unique approaches needed to defend cloud systems by discussing the architecture underpinning the cloud, the resulting threat model, and finally, how attackers abuse such systems.
First, a brief review of classic on-prem architecture and the inflection points attackers seek to exploit. Juxtaposing against a traditional tech stack is the architecture of your cloud service provider (CSP). I will walk you through the basics of cloud architecture, the new threat model that emerges and consequently, the steps attackers take to infiltrate cloud deployed resources. To wrap things up, once we are on firm ground as to why the cloud is different, I will highlight for you a cloud-native way in which attackers abuse the environment along with how defenders need to think about visibility in the cloud.
Threat Model of the Traditional Tech Stack
It can be helpful to provide a brief overview of data-center architecture even if most of the readers are well-versed in a traditional technology stack. Spending time touching on the threat model of data-center architectures helps with the juxta-positioning we will do later against a cloud threat model but also helps remind ourselves of the built-in assumptions we have about how to protect systems.
Examining vectors of initial compromise.
- Classic architecture can be plagued with exposed management ports where attackers could gain direct access to a server.
- We also need to model the risks associated with application-layer vulnerabilities and how they might be exploited to gain access to the OS-layer.
- A couple other common points of initial compromise in a classic system include the always relevant phishing attacks, implants delivered via email and the exploitation of host-layer vulnerabilities.
Each of these vectors can be leveraged by an attacker either to gain that initial foothold into an environment or used to progress through a system in pursuit of a goal – often to affect the confidentiality of data.
Attacker techniques are dictated by the characteristics of the tech stack.
This might seem like an obvious observation, that attackers will live off the land and adapt their methodologies to the technology stack they find themselves engaged with. Despite its simplicity, this universal truth helps explain the divergency of techniques used when threat actors target on-premises systems versus cloud infrastructure.
The on-premises systems adversaries are engaging with are likely installed with fully functional operating systems. That attack surface might be leveraged to pivot from a compromised workstation to a routable server in the victim’s datacenter.
Servers are not running air-gapped, they are connected to one another via a network. It is via this network attackers can move from host to host.
Frequently, permissive egress rules can be found within traditional data-center architecture. It is on this outbound network path that attackers seek to establish persistence through Command-and-Control tunnels and exfiltrate data out of a trusted network perimeter.
If you notice, the progression of the on-premises attack documented in the above diagram is driven by the surface area available to an attacker. In the next section as we transition into discussing cloud architecture, you will notice the same rules of the road remain, the technology stack informs the tactics and techniques attackers employ to achieve their objectives.
Cloud Architecture and the New Threat Model
The cloud is built on the concept of shared infrastructure, where customers are granted granular access to certain layers of the infrastructure stack to create and maintain resources. A cloud customer has complete autonomy to create IaaS resources, use PaaS services, transfer data, and create IAM (Identity and Access Management) policy to govern access – all because of their delegated permission to a sliver of the infrastructure the cloud service providers maintain.
Access to functionality is delegated and exposed to the customer through a layer of APIs broadly referred to as the Cloud Control-Plane APIs.
All end user interactions with a cloud environment are brokered through the Cloud Control-Plane by thousands of publicly available APIs. The control plane APIs allow customers to perform administrative tasks like creating new environments, provisioning users, maintaining resources and access data stored on managed PaaS services.
The Control Plane API’s responsibility is to:
- Authorize callers, ensure they have the correct permissions to perform the requested actions.
- Replay the action to the downstream component. An action could be to power cycle a VM, copy an object from one bucket to another or update the permissions of a user.
The cloud is powerful!
By exposing all functionality via a set of well-known, public APIs, businesses can find speed and scale impactful like they never could before. Building on the cloud is like pouring gasoline on your development cycles and it is why the great migration to the cloud in all sectors is underway in earnest despite the often-high cost of migration and ongoing cloud infrastructure costs.
Given this new paradigm, how should we model the threats facing data stored in the cloud?
Here I find it useful to focus on initial compromise because it is a great lens to highlight the similarities and the differences between on-prem and cloud threat models.
A couple vectors of initial compromise in the cloud should feel familiar.
- Initial compromise in the cloud can occur due to open management ports on IaaS resources. We are all familiar with an open SSH or RDP port attracting unwanted attention. In the cloud, those risks remain.
- Also, application-layer vulnerabilities are still entirely relevant. Insecure code deployed on publicly facing web applications at a minimum cause disruption to business operations and at worst, give attackers a foothold in your DMZ.
Any on-prem experience you have preventing and detecting initial compromise via these two vectors will serve you well in the cloud. The rules around prevention and detection might take a slightly cloud-native bent but are fundamentally the same when occurring in a cloud environment.
But what about the control-plane APIs? These are public endpoints where authorization is configurable by the customer. This attacker surface is completely novel, and it is where the savvy attacker will take advantage of the conveniences of the cloud to further their goals.
Let’s examine what an attack progression might look like when an attacker leverages control-plane APIs as opposed to an on-premises attack surface (see diagram):
- Initial compromise via phishing is a popular tactic of adversaries because it frequently works.
- The impact of harvested credentials can shift to the cloud when credentials are used to authenticate and authorize activity in cloud environments.
- It is unlikely the credentials associated with initial compromise provide a direct path for the adversary. As a result, one of many tried and true privilege escalation techniques in the cloud might be employed to obtain additional permissions.
- Campaigns will often look to establish some form of persistence. On the cloud control plane, that will look significantly different than on-prem. Persistence in the cloud often looks like the backdooring of account access through the manipulation of IAM policy.
- Walking through this attack progression, we are now at a point where access to the target has been obtained. In the cloud this simply means gaining the appropriate IAM permissions to access the cloud hosted data.
- And finally, in this scenario, actions on objectives are the exfiltration of data out of the environment. Again, cloud control-plane APIs are used to transfer data from the victim’s environment to an attacker-controlled environment.
This entire attack sequence, from initial compromise to impact was orchestrated through the publicly available APIs of the cloud service provider. At no point did the network-layer or host-layer come into play. No preventative controls or sensors on a network were even possible.
Cloud-Native Data Exfiltration
A key feature of any CSP is its backbone network. What is a backbone network?
- It is the service layer of the cloud service provider – used for operational background tasks of the CSP, the back-channel network used to communicate with the multi-tenant infrastructure and maintain availability.
- The backbone also refers to the network used by the CSP to transfer customer data as opposed to hauling bytes over the open web.
This backbone network results in many managed services such as cloud storage repositories being automatically routable to all other storage repositories of the CSP.
Tactically speaking, if you want to move data from one S3 bucket to another S3 bucket, all that is required is the IAM permissions to do so. The network path is already carved out over the CSPs (Cloud Service Provider) backbone network.
As the cloud consumer, it is not possible to implement any network restrictions around the data which lives in cloud-native storage1, and you do not have visibility into this network over which it travels.
For example, it is not possible to ingest any network-layer logs to capture the traffic between two S3 buckets.
This makes an attractive set of circumstances for an attacker motivated to exfiltrate data from a cloud environment.
Should they gain the appropriate IAM permissions data can be moved from a victim’s bucket to a bucket in an attacker-controlled account by submitting layer 7 API requests to the cloud control-plane.
To execute on this, an attacker is solely interacting with the publicly available cloud control plane APIs and leverages the CSP backbone network, a pre-configured network route, one not accessible to the customer.
Visibility on the Control-Plane
Data moved from one bucket to another does not leave the kind of trail most defenders are accustomed to.
- Network-layer logs, which might reveal the packets of data moving from your bucket to another – these are not available to you as the cloud consumer.
- Data movement happens over the backbone network for which cloud customers have no visibility.
- What about host-layer visibility?
- Cloud-native storage like S3 buckets, Azure storage blobs and the like are all managed services. The customer does not have access to the host or OS level as with the infrastructure as a service model. No agents can be deployed on managed services.
So that leaves us with the control-plane. None of the actions taken by an attacker could be identified with traditional sensors but indicators of the activity do show up in the logs written by the cloud control plane.
All actions on resources and cloud-hosted data are authorized by the cloud control-plane proxy APIs and result in some form of record.
- The actions of your developers when they create their buckets are recorded, the normal ingestion and reading of data is recorded and result in corresponding events.
- Conversely, when the bad guys leverage cloud control-plane APIs, their actions are recorded as the same event.
These event records tell the story of your cloud environment, who accessed what, from where and with what credentials, but not the user’s intentions. Determining whether a particular action has malicious or benign intent requires additional context clues and often a larger lens through which to view the environment.
Final Take-Aways
A couple of take-aways
- Adversaries will leverage the unique architecture of the cloud, and cloud-native services for the same reason developers use the cloud – it's fast! It scales! And the cloud control-plane APIs help them further their goals.
- The control-plane is where we can find evidence of activity, malicious or otherwise in a cloud environment. Network-based and host-based monitoring will not provide you with the visibility you need.
[1] VPC (Virtual Private Cloud) Service Controls: Only Google Cloud offers functionality to enforce perimeters around managed services, not bound to VPCs https://cloud.google.com/vpc-service-controls#section-6