Leveraging Athena for log analytics
Athena is an interactive query service that allows users to execute complex SQL queries across vast datasets, enabling a depth of analysis beyond basic monitoring. Athena’s ability to query security logs from various sources, including Security Lake, is invaluable for identifying complex patterns and correlations indicative of sophisticated security threats.
With Athena, organizations can perform real-time analysis of their security data, which is crucial for timely detection and response to potential security threats. Athena also facilitates the creation of comprehensive security reports, which are useful for internal audits, compliance verification, or incident response documentation.
As an example, consider the following SQL query in Athena, which combines data from CloudTrail and VPC flow logs to detect unusual patterns indicative of a potential security threat:
WITH cloudtrail_events AS (
SELECT
eventTime,
eventName,
awsRegion,
sourceIPAddress,
userAgent,
eventSource,
recipientAccountId
FROM cloudtrail_logs
WHERE eventName IN (‘StartInstances’, ‘StopInstances’)
),
vpc_flow AS (
SELECT
interfaceId,
startTime,
endTime,
sourceAddress,
destinationAddress,
action
FROM vpc_flow_logs
WHERE action = ‘REJECT’
)
SELECT
ct.eventTime AS apiEventTime,
ct.eventName AS apiEventName,
ct.awsRegion AS apiRegion,
ct.sourceIPAddress AS apiSourceIP,
vpc.startTime AS flowStartTime,
vpc.endTime AS flowEndTime,
vpc.sourceAddress AS flowSourceIP,
vpc.destinationAddress AS flowDestIP,
vpc.action AS networkAction
FROM
cloudtrail_events ct
JOIN
vpc_flow vpc
ON
ct.sourceIPAddress = vpc.sourceAddress
WHERE
ct.eventTime BETWEEN vpc.startTime AND vpc.endTime
ORDER BY
ct.eventTime;
The preceding query does the following:
- It creates two common table expressions (CTEs): cloudtrail_events for CloudTrail logs and vpc_flow for VPC flow logs.
- In cloudtrail_events, it selects relevant fields from CloudTrail logs, filtering for specific events such as StartInstances or StopInstances, which could indicate unauthorized instance manipulation.
- In vpc_flow, it selects data from VPC flow logs where network traffic was rejected, which could signal blocked attempts to access resources.
- The main SELECT statement joins these two datasets on the condition that the source IP address in the CloudTrail log matches the source address in the VPC flow logs. Additionally, it ensures the CloudTrail event time falls within the start and end times of the VPC flow logs entry.
- The query then orders the results by the event time from CloudTrail, providing a chronological view of potentially related API and network activities.
By correlating CloudTrail and VPC flow logs, this query helps identify instances where API calls to control AWS resources coincide with rejected network traffic from the same IP address. This pattern could suggest a targeted attack, where an adversary is attempting to manipulate AWS resources while simultaneously probing the network for vulnerabilities or attempting unauthorized access. This insight allows security teams to conduct a focused investigation, check for compromised credentials, or identify the need for tighter security controls.