How Kubernetes Handles External Traffic, Storage, and Autoscaling

In the previous post, I covered the differences between cloud, Docker, and Kubernetes, along with the basic structure of Kubernetes. Once you understand clusters, Pods, Deployments, and Services, you start to get a sense of how applications actually run.

But from an operational standpoint, that's not where it ends.

Where should external user requests come in?
How do you preserve data even when Pods get recreated?
How do you adjust the Pod count when traffic grows?

Kubernetes handles each of these problems with separate resources. In this post, I'll focus on how external traffic is handled, how storage is managed, and how autoscaling works.

Why Service Alone Isn't Enough

A Service provides a stable access point to Pods. Since Pods can have their IPs change when recreated, having clients point directly at Pods isn't reliable. Service exists to solve that problem.

But Service alone doesn't cover everything. That's because Service is fundamentally a resource for accessing Pods from inside the cluster.

In other words, the problem of finding Pods reliably inside the cluster is different from the problem of bringing external user requests into the cluster.

This is the point where resources like Ingress or the Gateway API become necessary.

How Does External Traffic Get In?

To actually serve an application, external user requests need to reach the cluster. For example, when a browser requests a particular domain or path, you need to decide which Service that request gets forwarded to.

The two main approaches used for this role are Ingress and the Gateway API.

Ingress

Ingress is a resource that defines which Service an HTTP(S) request should be routed to. For example, you can think of rules like the following.

Requests to api.example.com go to the API Service
The /web path goes to the Web Service
The /admin path goes to the Admin Service

That said, just having an Ingress resource doesn't mean traffic is automatically handled. To make Ingress rules actually work, you need an Ingress Controller.

In other words, Ingress is the rule declaration, and the Ingress Controller is the actual executor that processes those rules.

What's Different About Gateway API?

Gateway API is a new standard that emerged to address the limitations of Ingress.

Ingress has a relatively simple structure and has been widely used, but it has some limitations.

It's primarily designed around HTTP(S).
Fine-grained settings often depend on controller-specific annotations.
It's hard to cleanly separate the roles of infrastructure administrators and application developers.

Gateway API improves on these issues by providing a structure with separated roles.

Three resources typically come up.

GatewayClass
Gateway
HTTPRoute

GatewayClass

GatewayClass is a resource that defines which Gateway Controller to use. In other words, it specifies the implementation that processes traffic.

Gateway

Gateway defines which port and protocol to accept requests on. For example, you can configure it to accept requests on HTTP port 80.

HTTPRoute

HTTPRoute defines which Service to forward requests for a given domain and path.

So at a high level, Gateway API breaks down the roles like this.

GatewayClass: which controller to use
Gateway: how to accept requests
HTTPRoute: where to forward the accepted requests

Thanks to this structure, you can manage infrastructure configuration and application routing rules more separately.

What Happens to Data When a Pod Is Recreated?

In Kubernetes, Pods aren't permanent instances. They can be recreated at any time for various reasons such as node failure, rescheduling, or updates.

This also means data stored only inside a Pod can disappear.

For example, if your application stores upload files, logs, cache, or state data only on the Pod's internal filesystem, that data can vanish along with the Pod when it gets replaced.

So when data needs to persist, you have to use storage outside of the Pod.

How Are StorageClass, PV, and PVC Related?

Three concepts come up often when dealing with storage in Kubernetes.

StorageClass
PersistentVolume (PV)
PersistentVolumeClaim (PVC)

Each of these has a different role.

StorageClass

StorageClass is a policy for how storage should be provisioned. For example, it defines criteria such as what kind of disk to use and when to bind the volume.

In other words, StorageClass is closer to "what storage should be provided under what rules."

PersistentVolume (PV)

A PV is actually allocated storage. From the cluster's perspective, it's the concrete storage that's available to use.

PersistentVolumeClaim (PVC)

A PVC is how an application requests storage. That is, when a Pod says "I need this much storage," Kubernetes connects it to an appropriate PV.

To summarize.

StorageClass: policy for how storage is provided
PV: the actual storage
PVC: the storage request

Why Request Storage Through PVC?

The reason Pods don't specify storage directly but instead request it through a PVC is to decouple the application from the actual storage implementation.

From the application's perspective, "how much do I need" and "how do I access it" matter more than "which specific disk product to use." PVC lets you express that request declaratively.

For instance, a Pod only needs to express the following requirements.

Storage size
Access mode
Which StorageClass to use

Kubernetes handles the actual binding process from there.

What Does accessModes Mean?

accessModes is a value you'll often see when defining a PVC. It indicates how the storage can be accessed.

The main modes are as follows.

ReadWriteOnce (RWO): Read/write from a single node.
ReadOnlyMany (ROX): Read-only from multiple nodes.
ReadWriteMany (RWX): Read/write from multiple nodes.

In learning environments or with local storage, ReadWriteOnce is the most common choice. Local disk-based storage tends to be tied to a specific node, which makes it hard to set up configurations where multiple nodes read and write simultaneously.

Why Do StorageClass Options Matter?

StorageClass has several options related to how storage behaves. The ones you'll see most often are reclaimPolicy and volumeBindingMode.

reclaimPolicy

reclaimPolicy determines what happens to the PV when the PVC is deleted.

Delete: The PV is deleted along with the PVC.
Retain: The PV is kept even after the PVC is deleted.

In other words, it's a policy that decides whether to automatically clean up the storage together or leave the data behind.

volumeBindingMode

volumeBindingMode determines when the PV is bound.

For example, WaitForFirstConsumer performs binding when a Pod that uses the PVC is actually scheduled. This approach is especially useful for local storage environments that are tied to specific nodes.

That's because if you pin storage first when you don't yet know which node the Pod will land on, it can end up out of sync with the actual scheduling.

What Does HPA Do?

When running an application, the same fixed number of Pods isn't always enough. It's more efficient to scale Pods up during traffic spikes and scale them down during quiet periods.

The resource in Kubernetes that does this is the HPA (Horizontal Pod Autoscaler).

HPA targets resources like Deployments and automatically increases or decreases the Pod count. In other words, it's a feature that automates the horizontal scaling of applications.

For example, you can set policies like the following.

Minimum Pod count: 2
Maximum Pod count: 6
Increase Pod count when CPU usage exceeds a certain threshold

What Does HPA Use as Its Basis?

HPA operates based on metrics. The typical basis is resource metrics like CPU usage or memory usage.

For example, if it operates based on CPU usage, it looks at each Pod's usage and determines whether the current Pod count is sufficient.

The important thing here is that HPA doesn't just look at the fact that "CPU is high" — it calculates how much is being used relative to some baseline.

Why Do You Need requests?

When HPA operates based on CPU utilization, the requests value set on the Pod matters.

For example, if a Pod has its CPU request set to 100m and actual usage is 50m, utilization is 50%. In other words, HPA's averageUtilization is calculated as utilization relative to requests.

So if requests aren't set, percentage-based utilization calculations become ambiguous or may not behave as expected.

To summarize.

requests: the minimum resource request that serves as the scheduling baseline
HPA averageUtilization: utilization relative to requests

So to use HPA properly, you also need to consider Pod resource settings.

Why Do You Need metrics-server?

For HPA to work, it needs to know the current CPU and memory usage of Pods. The component that typically provides this information is metrics-server.

In other words, HPA doesn't measure metrics on its own — it receives data through a metrics collection tool inside the cluster.

In fact, commands like kubectl top pods only work when this metrics layer is in place.

So before configuring HPA, you should first check whether the cluster is set up to collect resource metrics.

Wrapping Up

In Kubernetes, not just application execution but also external traffic, storage, and scaling strategies are each managed as separate resources.

External traffic is connected through Ingress or the Gateway API.
Storage is handled through StorageClass, PV, and PVC.
Autoscaling is handled through HPA.

In other words, Kubernetes isn't just a tool for running containers — it's an operations platform that manages networking, storage, and scaling strategies as separate resources.