Logging and Monitoring
Logging and monitoring provide visibility into what is happening inside a Google Cloud environment. They are essential for troubleshooting, performance tuning, security investigations, and operational awareness.
Google Cloud provides native services for both logging and monitoring through Cloud Operations.
Logging in Google Cloud
Cloud Logging collects and stores logs generated by Google Cloud services and workloads.
By default, many services such as Compute Engine and GKE generate logs automatically. Operating system and application logs from virtual machines require the Ops Agent to be installed.
Logs are created and stored at the project level by default but can be centralized for easier management.
Log capabilities
Cloud Logging supports the following capabilities:
- Collecting logs from Google Cloud services
- Ingesting operating system and application logs
- Creating log based metrics
- Exporting logs to external destinations
- Searching and analyzing logs in near real time
Log retention and cost
Different log types have different retention periods and cost models.
- Admin Activity logs are retained for four hundred days and are free
- Data Access logs are retained for thirty days and are charged by volume
- Network logs such as VPC Flow Logs are retained for thirty days and are charged by volume
- Ops Agent logs are retained for thirty days and are charged by volume
Types of logs
Google Cloud audit logs
Audit logs record actions performed on Google Cloud resources.
- Admin Activity logs capture configuration changes and are always enabled
- Data Access logs record data read and write operations and are usually disabled by default
- System Event logs record internal Google Cloud actions
- Access Transparency logs show when Google personnel access customer data for support purposes
Admin and user audit logs
These logs track changes made through identity and administration systems. They include actions such as user creation, group updates, and security configuration changes.
Because retention in the Admin Console is limited, exporting these logs is recommended for compliance and auditing.
Ops Agent logs
The Ops Agent provides visibility inside virtual machines. It collects system logs and application logs such as web servers and databases.
This allows teams to understand what is happening within the operating system rather than only at the infrastructure level.
Network logs
Network logs provide insight into traffic and connectivity.
- VPC Flow Logs capture network flow metadata
- Firewall rule logs show allowed and denied traffic
- Cloud NAT logs track outbound internet connectivity
Metrics and monitoring
In addition to logs, Google Cloud collects metrics that represent numerical time series data. Metrics are stored in Cloud Monitoring and are used to build dashboards and alerts.
Types of metrics
There are three main categories of metrics.
- Google Cloud metrics collected automatically by the platform
- Agent metrics collected by the Ops Agent from virtual machines
- Custom metrics defined by applications and workloads
Monitoring workflow
The monitoring workflow typically follows these steps:
- Metrics are ingested from services and agents
- Metrics are explored using Metric Explorer
- Dashboards are created to visualize system health
- Alerts are configured to notify teams when thresholds are exceeded
Key concepts when working with metrics include resource type and metric type.
Centralized logging architecture
Although logs are generated per project, they do not need to be managed separately. Google Cloud supports centralized logging using log sinks.
Log Router and sinks
Every project includes a Log Router. By default, logs remain in the project where they are generated.
Sinks can be created to copy logs to a central destination such as:
- Log buckets
- Cloud Storage
- BigQuery
- Pub Sub
Folder and organization level sinks
Sinks can be created at the folder or organization level.
- Folder level sinks can aggregate logs from all child projects
- Organization level sinks collect logs from the entire organization
This approach is commonly used for security, compliance, and long term retention.
Choosing log destinations
Different destinations serve different use cases.
- Log buckets are best for daily troubleshooting
- Cloud Storage is best for long term archival
- BigQuery is best for analytics and reporting
- Pub Sub is best for streaming logs to external systems
Outcome
A well designed logging and monitoring strategy provides visibility, reliability, and security across Google Cloud environments. Centralizing observability enables teams to respond faster to issues, meet compliance requirements, and operate with confidence at scale.