How to handle payment data without going crazy over PCI


I was 16 the first time I heard about "PCI DSS", a security standard introduced by payments companies that businesses must adhere to in order to securely handle credit card information. I was working for a young payments startup that offered portable card terminals (POS) to small merchants. Card numbers were being encrypted by at the hardware level using multiple layers of AES and the hardware's unique key, data streamed over the iPhone audio jack to the mobile app and from there took a trip over the internet over encrypted HTTPS to the mobile app API (backend) to process the payment. The sensible card information was then decrypted and could easily be found stored inside multiple services. All good until we reached the backend, right?

A few years I go, I had the challenge to implement PCI myself and saw first hand how challenging and how quickly you can complicate yourself. As a fast-growing startup with work at hands-full it is easy leave PCI as an afterthought. Until your business partners require you to comply with the certification or you'll be unable to access those services/unit economic numbers you need to keep growing, which you need to raise your next round, or otherwise you'r company dies. So, yeah. You'll need PCI asap.

It is my goal to share my approach at solving this challenge from a technical perspective when you really need to handle card data.

The data

Cardholder data, includes:

  • The card number: Primary Account Number, PAN
  • Card expiration date
  • Verification code, CVV/CVC
  • The PIN you type in the bank terminal to approve a transaction and type in the ATM to withdraw cash.
  • Cardholder name

Sensitive authentication data, like:

  • The authorisation/signature generated by the card chip (EMV) as the result of approving a transaction with a physical card in a physical terminal (POS).

Good to know

  • PCI is not strictly a tech thing. Involves processes, people, etc.
  • Cardholder data environment, CDE: anything/anyone in contact with or that have access to the data.
    • People. For example, operations teams, disputes, fraud prevention, customer support.
    • Processes: automated scripts or old manual first do this then do that, and the classic — can you send me the excel sheet [that contains sensitive data] to my email?
    • Technology: apps, servers, hardware, infrastructure and third party providers.

Reduce the Scope

Engineers can easily oversee that they're handling sensitive data when building a pice of software. Data can permeate everywhere, making each system target for compliance. I went meta and understood that "Never touching card data is the simplest way to be PCI compliant". Thus, let's reduce the scope of services that needed/could have access.

We'll be applying tokenisation – turn high value data into UUID garbage –. But I wont get in details of how that works since there are multiple ways todo this and each is incrementally more complicated. If you have more questions about this, please get in touch with me, I'll be happy to answer any questions. Or use a service like AWS CloudHSM.

For this project we have Go micro-services running on Kubernetes. Theres a public API for consumers and internal API where notifications are sent by the network partners, which is protected by VPNs.

The design of the system we'll build looks like this:

Intercept incoming requests

We'll have Reverse Proxy as the at API gateway level that will intercept incoming requests and replace sensitive data with tokens.

  • Authenticate the request.
  • Check for idempotency.
  • Telemetry: start tracing if no trace/span exists
  • ...
  • Look for fields in incoming request that we know have sensitive information like "card_number", or "cvv" and replace it with tokens.

For example, the requester wants to create a new card:

POST /v1/payment_methods
{
"card_number": "4242424242424242",
"cvc": "123"
}

Becomes:

POST /v1/payment_methods
{
"card_number": "tok_1ce85b37fc234451afff384df3c903ba"
"cvc": "tok_37dd7af618ad494486dcf66c68e2aad3"
}

Only then we let the request continue it path. No service will ever touch actual card data, only the token. So it is safe for other services to store the token.

Getting that thing over there

There will be services were we'll need to use the actual card number and other sensitive information. Have a service send the decrypted card details to a third party to process a payment.

Instead of allowing this service to pull the decrypted card numbers and risk it would spread it around, we'll implement a forward proxy. Instead of calling the provider directly, the caller will have to instruct the Forward Proxy to make the request. The Proxy will retrieve the token from the token service, and replace the fields with tokens with the corresponding card details. Then forward the request to the third party.

This proxy will have to:

  • Block all requests by default
  • Only explicitly allowed urls can receive a request with the actual card data.
  • Access to the token service, where we store the tokens
  • Only allowed services can call the proxy.

For example:

POST http://forward_proxy.ns_tokenisation.svc.cluster.local:8080
{
"type": "http", // or grpc,graphql,etc
"http": {
"url": "https://na-gateway.mastercard.com/api/rest/version/43/merchant/1234/order/1234/transaction/1234"
"method": "PUT",
"headers": {...}
"payload": {
"sourceOfFunds": {
"cardNumber": "tok_89c34a5056a548f7a61af6f11ec41562",
"type": "CARD"
}
"order": {
"amount": 123.45
"currency": "USD"
},
}
}
}

The proxy will replace the tokens with the corresponding values and execute the request.

PUT https://na-gateway.mastercard.com/api/rest/version/43/merchant/1234/order/1234/transaction/1234
...headers
{
"sourceOfFunds": {
"cardNumber": "4242424242424242",
"type": "CARD"
}
"order": {
"amount": 123.45
"currency": "USD"
},
}
---
Response:
{
"order.id": "975507c27706",
"order.amount": 123.45
"order.currency": "USD"
}

Tokenisation

In this article I wont get into the details of how tokens are handled or how one should handle the encryption. Banks usually have access to (expensive) hardware security modules (HSMs) that do the encryption part. If you don't have access to one, you could use a managed service like AWS CloudHSM.

The tokenisation service will generate a unique id for each piece of data that needs to be stored and make sure to store the original sensitive data at rest. It must have strong access controls and access logs. Only the proxies should be allowed to make requests to it.

The service looks like this:

tokens-service.proto
Copy
syntax = "proto3";
package namespace.service.tokens.v1;
option go_package = "github.com/namespace/api/service.tokens/proto;tokenspb";
service TokensService {
rpc CreateToken(CreateTokenRequest) returns(CreateTokenResponse) {}
rpc RetrieveToken(RetrieveTokenRequest) returns(RetrieveTokenResponse) {}
}
message CreateTokenRequest {
string data = 1;
}
message CreateTokenResponse {
string id = 1;
}
message RetrieveTokenRequest {
string id = 1;
}
message RetrieveTokenResponse {
string data = 1;
}