Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anamoly Detection #535

Open
michaelconnor00 opened this issue Dec 19, 2024 · 8 comments
Open

Anamoly Detection #535

michaelconnor00 opened this issue Dec 19, 2024 · 8 comments
Assignees

Comments

@michaelconnor00
Copy link
Collaborator

michaelconnor00 commented Dec 19, 2024

There has been some anamoly detections regarding an increase in DataTransfer-Regional-Bytes. The only trace of this usage is with DataTransfer from the bills, and is called regional data transfer - in/out/between EC2 AZs or using elastic IPs or ELB.

Looking on Google, the best way to troubleshoot this is with VPC Flow logs. Which are now currently enabled and writing to S3, and there is an Athena integration configured so the logs can be queried from there.

The log format is as follows:

${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${ecs-cluster-name} ${ecs-cluster-arn} ${ecs-container-instance-id} ${ecs-container-instance-arn} ${ecs-service-name} ${ecs-task-definition-arn} ${ecs-task-id} ${ecs-task-arn} ${ecs-container-id} ${ecs-second-container-id}
@michaelconnor00 michaelconnor00 self-assigned this Dec 19, 2024
@michaelconnor00
Copy link
Collaborator Author

michaelconnor00 commented Dec 19, 2024

Using Athena, the date range the data is as follows:

image

So a little under 4hrs.

Doing a query to see what IPs are doing the most talking (src):

image

The most concerning are those IPs that are over a few hundred MBs, which would be the top 7 or so. However, the biggest smoking gun is IP 10.10.65.122, which is RDS.

Doing a similar query to see top IPs listening (dest):

image

Here the top 10 seen to be doing a lot, and top consumer is 10.10.45.25 (old instance IP I think), followed by 10.10.65.122 (RDS).

Digging more into RDS, src:

image

And dest:

image

Notes:

  • Interface eni-071b9586bf9f16cee is attached to the RDS instance.
  • All the other traffic is from ECS tasks.

Questions:

  • Why so much data over a 4 hr period???

@michaelconnor00
Copy link
Collaborator Author

michaelconnor00 commented Dec 19, 2024

Created another query this morning:

image

I also wrote a script to easier identify the IP address:

==> ENI IP Addresses
eni-01d0ba4f9618bf16d - 10.10.23.200 - /i-0562df127e1ce05ff - EcsSg17B4B0B3-XN4WMB249P17 - EC2 Instance
eni-0fde57dba38bd9c99 - 10.10.23.209 - /i-0562df127e1ce05ff - ASGInstanceSecurity - 
eni-07ae8c53f56524b17 - 10.10.0.101 - 52.44.36.97/amazon-elb - MermaidApiLoadBalancerSecurity - ELB app/merma-Merma-7G2ZIRYPYQN4/8838074e02959036
eni-0df0f70bf43302d49 - 10.10.0.178 - 44.206.31.158/554812291621 -  - Interface for NAT Gateway nat-0706d94f72b9c299f
eni-0c11bc80d405d2f15 - 10.10.39.214 - /i-02167ba53ff8102f6 - ASGInstanceSecurity - 
eni-071b9586bf9f16cee - 10.10.65.122 - / - PostgresRdsV2Security - RDSNetworkInterface
eni-0eec7486d4a546f76 - 10.10.42.89 - /i-0b0d149e875fb22cf - ASGInstanceSecurity - 
eni-0ce35eaf2d4a949e1 - 10.10.1.240 - /i-0f83dbcf0f94b8441 - gadmin-sg - 
eni-0123be6bfb8d7de87 - 10.10.42.65 - /i-02167ba53ff8102f6 - EcsSg17B4B0B3-XN4WMB249P17 - EC2 Instance
eni-05ca2f004fed01083 - 10.10.1.250 - 44.223.200.216/amazon-elb - MermaidApiLoadBalancerSecurity - ELB app/merma-Merma-7G2ZIRYPYQN4/8838074e02959036
eni-07725543a40609fab - 10.10.38.175 - /i-0b0d149e875fb22cf - EcsSg17B4B0B3-XN4WMB249P17 - EC2 Instance
eni-0ff178d9e8fda6f11 - 10.10.2.83 - 52.55.13.147/amazon-elb - MermaidApiLoadBalancerSecurity - ELB app/merma-Merma-7G2ZIRYPYQN4/8838074e02959036
eni-05851e2df640f3cd7 - 10.10.56.119 - /i-03754798c547d6338 - EcsSg17B4B0B3-XN4WMB249P17 - EC2 Instance
eni-0dc7de36ac126592b - 10.10.59.20 - /i-03754798c547d6338 - ASGInstanceSecurity - 

The top traffic was through ENI eni-0eec7486d4a546f76, which is attached to an EC2 instance. The source IP is 16.182.70.90, which is an Amazon public IP in us-east-1. However, I cannot tell what the source is. Also Note, the ENI is using the security group ASGInstanceSecurity, and not the ECS group EcsSg17B4B0B3-XN4WMB249P17.

Still can't tell if this traffic is valid, but doesn't seem to be.

Also note, the query is limited to Dec 19th 7am PST, to now ish (Dec 19 11am PST).

@michaelconnor00
Copy link
Collaborator Author

See this reference for definitions for each log attribute: https://docs.aws.amazon.com/vpc/latest/userguide/flow-log-records.html#flow-logs-default

After seeing this I realized there were more attributes available, so I created a new VPC flow logs with all available attributes, and created another Athena integration.

The below screen shot is a query with only 10min of data. As you can see, the IPS we were concerned with are from S3.

image

The top results are RDS and S3. So it seems the issues are internal.

@michaelconnor00
Copy link
Collaborator Author

Confirming that no traffic is cross-region:

image

@michaelconnor00
Copy link
Collaborator Author

Query for Traffic Path:

Definition
1 — Through another resource in the same VPC, including resources that create a network interface in the VPC
2 — Through an internet gateway or a gateway VPC endpoint
3 — Through a virtual private gateway
4 — Through an intra-region VPC peering connection
5 — Through an inter-region VPC peering connection
6 — Through a local gateway
7 — Through a gateway VPC endpoint (Nitro-based instances only)
8 — Through an internet gateway (Nitro-based instances only)

image

Traffic type 1 is not concerning, 7&8 need more digging:

image

Traffic 7 is all egress traffic to S3.

image

All of the traffic 8 makes sense as it is all from the NAT (10.10.0.178) or the ELB (10.10.2.83,10.10.1.250)

@ms280690
Copy link
Collaborator

@michaelconnor00
Copy link
Collaborator Author

Query all data where AWS service is S3, and sum bytes, and number of records.

image

There is some consistence with each record, and it looks like these could be Backups.

This query limits the date range to 2025/01/05 @ 3pm PST to 2025/01/06 @ 3pm.

image

image

This shows some records that are large but don't match the previous query. Next, let's filter by one of the sources.

SRCADDR = 52.216.213.42
image

SRCADDR = 52.216.88.206 Plus add timestamps
image

Dev Backup size on S3
image

Prod Backup size on S3
image

Based on the size of each record, It can't be the backups. An they should only run once a day.

Looking at Cloudwatch log groups, we can see that the job is only running once. So that does not explain it. We may need to look at S3 access logs.

This also is not related to the large amounts of data flowing between ECS and RDS.

@michaelconnor00
Copy link
Collaborator Author

michaelconnor00 commented Jan 7, 2025

This query is looking at records that are NOT to S3. Which I believe are all ingress/egress to RDS (10.10.65.122).

The time frame is between Jan 5/2025 and Jan 6/2025 (24hr period).

image

As can bee seen, the top 10 are in the Gigabytes in size. Note that each record is a 1 minute aggregation. It seems unreasonable that there is 3GB of data transfer from RDS to ECS containers in 1min.

Grouping the records, summing the bytes, and counting the records in the groups, gives us the following.

image

The top 4 are concerning as it is 11GB over 24 hours, from only 4-6 records (4-6 minutes). The other results where the records counts are high, make sense for database traffic.

The next query will filter to the one IP address that has 6 records, and 11GB.

image

As can be seen, the 11GB includes both ingress and egress traffic. The top ten records are the concerning ones. Ingress from 10.10.63.72 was over 4GB in 1min.

How can that much data be transfered to RDS in 1min when the DB size is about 700MB? Same questions for the egress records over 3GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants