gao.ninja logogao.ninja

Logging and Monitoring

Logging and monitoring provide visibility into what is happening inside a Google Cloud environment. They are essential for troubleshooting, performance tuning, security investigations, and operational awareness.

Google Cloud provides native services for both logging and monitoring through Cloud Operations.

Logging in Google Cloud

Cloud Logging collects and stores logs generated by Google Cloud services and workloads.

By default, many services such as Compute Engine and GKE generate logs automatically. Operating system and application logs from virtual machines require the Ops Agent to be installed.

Logs are created and stored at the project level by default but can be centralized for easier management.

Log capabilities

Cloud Logging supports the following capabilities:

  • Collecting logs from Google Cloud services
  • Ingesting operating system and application logs
  • Creating log based metrics
  • Exporting logs to external destinations
  • Searching and analyzing logs in near real time

Log retention and cost

Different log types have different retention periods and cost models.

  • Admin Activity logs are retained for four hundred days and are free
  • Data Access logs are retained for thirty days and are charged by volume
  • Network logs such as VPC Flow Logs are retained for thirty days and are charged by volume
  • Ops Agent logs are retained for thirty days and are charged by volume

Types of logs

Google Cloud audit logs

Audit logs record actions performed on Google Cloud resources.

  • Admin Activity logs capture configuration changes and are always enabled
  • Data Access logs record data read and write operations and are usually disabled by default
  • System Event logs record internal Google Cloud actions
  • Access Transparency logs show when Google personnel access customer data for support purposes

Admin and user audit logs

These logs track changes made through identity and administration systems. They include actions such as user creation, group updates, and security configuration changes.

Because retention in the Admin Console is limited, exporting these logs is recommended for compliance and auditing.

Ops Agent logs

The Ops Agent provides visibility inside virtual machines. It collects system logs and application logs such as web servers and databases.

This allows teams to understand what is happening within the operating system rather than only at the infrastructure level.

Network logs

Network logs provide insight into traffic and connectivity.

  • VPC Flow Logs capture network flow metadata
  • Firewall rule logs show allowed and denied traffic
  • Cloud NAT logs track outbound internet connectivity

Metrics and monitoring

In addition to logs, Google Cloud collects metrics that represent numerical time series data. Metrics are stored in Cloud Monitoring and are used to build dashboards and alerts.

Types of metrics

There are three main categories of metrics.

  • Google Cloud metrics collected automatically by the platform
  • Agent metrics collected by the Ops Agent from virtual machines
  • Custom metrics defined by applications and workloads

Monitoring workflow

The monitoring workflow typically follows these steps:

  1. Metrics are ingested from services and agents
  2. Metrics are explored using Metric Explorer
  3. Dashboards are created to visualize system health
  4. Alerts are configured to notify teams when thresholds are exceeded

Key concepts when working with metrics include resource type and metric type.

Centralized logging architecture

Although logs are generated per project, they do not need to be managed separately. Google Cloud supports centralized logging using log sinks.

Log Router and sinks

Every project includes a Log Router. By default, logs remain in the project where they are generated.

Sinks can be created to copy logs to a central destination such as:

  • Log buckets
  • Cloud Storage
  • BigQuery
  • Pub Sub

Folder and organization level sinks

Sinks can be created at the folder or organization level.

  • Folder level sinks can aggregate logs from all child projects
  • Organization level sinks collect logs from the entire organization

This approach is commonly used for security, compliance, and long term retention.

Choosing log destinations

Different destinations serve different use cases.

  • Log buckets are best for daily troubleshooting
  • Cloud Storage is best for long term archival
  • BigQuery is best for analytics and reporting
  • Pub Sub is best for streaming logs to external systems

Outcome

A well designed logging and monitoring strategy provides visibility, reliability, and security across Google Cloud environments. Centralizing observability enables teams to respond faster to issues, meet compliance requirements, and operate with confidence at scale.

Google Cloud Onboarding Series

  1. Technical Onboarding Center
  2. Cloud Identity and Organization
  3. Users and Groups
  4. Administrative Access
  5. Resource Hierarchy
  6. Network Management
  7. Hybrid Connectivity
  8. Logging and Monitoring (current)
  9. Organizational Security
  10. Customer Care Portfolio