Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

CloudPro

45 Articles
Shreyans from Packt
01 Nov 2024
7 min read
Save for later

A hard look at GuardDuty shortcomings

Shreyans from Packt
01 Nov 2024
7 min read
Cloudflare adopted OpenTelemetry for logging pipeline CloudPro #71: A hard look at GuardDuty shortcomings ⭐Masterclass: From Docker Compose to Kubernetes Manifests A hard look at GuardDuty shortcomings Streamlining Keycloak in Kubernetes The hater’s guide to Kubernetes A skeptic's first contact with Kubernetes 🔍Secret Knowledge: Enhancing Bitnami Helm Charts Security Cloudflare adopted OpenTelemetry for logging pipeline Josh Grose on LinkedIn: I spent the last 3 yrs outside of observability Did you know the CNCF has an actual cookbook? Not metaphorically! Unfashionably secure: why we use isolated VMs 🛠️HackHub: Best Tools for the Cloud Web tool for database management The devs are over here at devzat, chat over SSH! CloudFormation_To_Terraform Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster. Kubernetes network solution Cheers, Shreyans Singh Editor-in-Chief Forward to a Friend 🔍Secret Knowledge: Learning Resources Related Titles Enhancing Bitnami Helm Charts Security Bitnami enhanced the security of its Helm charts using Kubescape, an open-source Kubernetes security tool that identifies misconfigurations by comparing configurations to industry best practices. By integrating Kubescape into their build pipelines, Bitnami made significant improvements such as eliminating group root dependencies, configuring immutable filesystems, and reducing misconfigured resources. Cloudflare adopted OpenTelemetry for logging pipeline Cloudflare recently transitioned its logging pipeline from syslog-ng to OpenTelemetry Collector to enhance performance, maintainability, and telemetry insights. This move allowed the team to leverage Go, a language more familiar to their engineers, and integrate better observability through Prometheus metrics. Despite challenges like minimizing downtime during the switch and ensuring compatibility with existing infrastructure, the migration has opened up opportunities for further improvements, such as better log sampling and migration to the OpenTelemetry Protocol (OTLP). Josh Grose on LinkedIn: I spent the last 3 yrs outside of observability Josh Grose (ex-Principal PM, Splunk), after three years away from the observability space, was surprised to find that despite companies spending around 30% of their cloud budgets on monitoring, reliability hasn't improved significantly. He observed that even when Service Level Agreements (SLAs) are met, it often comes at the cost of developer productivity and experience. Engineering leaders are frustrated with the high costs and limited improvements in key metrics like Mean Time to Recovery (MTTR) and development speed, leading to the perception that observability has become an expensive and ineffective necessity. Did you know the CNCF has an actual cookbook? Not metaphorically! The "Cloud Native Community Cookbook" is a unique collection of recipes put together by the CNCF and Equinix Metal, born out of the increased time people spent at home during the COVID-19 pandemic. Instead of focusing on cloud technologies, this cookbook brings together food recipes shared by members of the Cloud Native community, originally exchanged in Equinix Metal's Slack channel. Unfashionably secure: why we use isolated VMs While modern cloud architectures often favor shared, multi-tenant environments for efficiency and scalability, Thinkst Canary opts for a less trendy but highly secure approach by using isolated virtual machines (VMs) for each customer. This choice prioritizes security by ensuring that each customer's data and services are completely separated, reducing the risk of cross-customer data breaches. Although this method comes with higher operational costs and complexity, it provides a stronger security boundary, making it easier to manage risks and sleep better at night. ⚡TechWave: Cloud News & Analysis How Figma Migrated onto K8s in Less Than 12 months Figma completed its migration to Kubernetes in under a year by meticulously planning and executing a well-scoped transition. Initially running services on AWS's ECS, Figma faced limitations such as complex stateful workloads and limited auto-scaling. The decision to move to Kubernetes (EKS) was driven by its broader functionality, including support for StatefulSets, Helm charts, and advanced scaling options from the CNCF ecosystem. By Q1 2024, Figma had migrated most core services with minimal impact on users, resulting in enhanced reliability, reduced costs, and a more flexible compute platform. Github Copilot Autofix: Secure code 3x faster Copilot Autofix, now available in GitHub Advanced Security, is an AI-powered tool designed to help developers fix code vulnerabilities more than three times faster than manual methods. It analyzes vulnerabilities, explains their significance, and offers code suggestions for quick remediation. This accelerates the fixing process for both new vulnerabilities and existing security debt, significantly reducing the time and effort required for secure coding. Copilot Autofix is included by default for GHAS customers and also available for open source projects starting in September. New Kubernetes CPUManager Static Policy: Distribute CPUs Across Cores Kubernetes v1.31 introduces a new alpha feature called "distribute-cpus-across-cores" for the CPUManager's static policy. This option aims to enhance performance by spreading CPUs more evenly across physical cores, rather than clustering them on fewer cores. This reduces contention and resource sharing between CPUs on the same core, which can boost performance for CPU-intensive applications. To use this feature, users need to adjust their Kubernetes configuration to enable it. Currently, it cannot be combined with other CPUManager options, but future updates will address this limitation. Announcing mandatory multi-factor authentication for Azure sign-in Microsoft is making multi-factor authentication (MFA) mandatory for all Azure sign-ins to enhance security and protect against cyberattacks. Starting in the latter half of 2024, Azure users will need to use MFA to access the Azure portal and admin centers, with broader enforcement for other Azure tools like CLI and PowerShell set for early 2025. MFA, which adds an extra layer of security by requiring more than just a password, is shown to block over 99% of account compromises. GitHub scales on demand with Azure Functions GitHub faced scalability issues with its internal data pipeline, which struggled to handle the massive amount of data it collects daily. To address this, GitHub partnered with Microsoft to use Azure Functions' new Flex Consumption plan, which allows serverless functions to scale dynamically based on demand. This solution has enabled GitHub to efficiently process up to 1.6 million events per second, addressing their growth challenges and improving performance with minimal overhead. 🛠️HackHub: Best Tools for Cloud commandprompt/pgmanage PgManage is a modern graphical database client for PostgreSQL, focusing on management features and built on the now-dormant OmniDB project. quackduck/devzat Devzat is a chat service accessible via SSH that replaces the traditional shell prompt with a chat interface, allowing you to connect from any device with SSH capabilities. aperswal/CloudFormation_To_Terraform The CloudFormation to Terraform Converter is a tool that automates the migration of AWS CloudFormation templates to Terraform configuration files. bloomberg/goldpinger Goldpinger monitors Kubernetes networking by making calls between its instances and providing Prometheus metrics for visualization and alerts. ZTE/Knitter Knitter is a Kubernetes CNI plugin that supports multiple network interfaces for pods, allowing custom network configurations across various cloud environments. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 13390

Shreyans from Packt
25 Oct 2024
13 min read
Save for later

Building Lightweight Kubernetes Dev Ephemeral Environments

Shreyans from Packt
25 Oct 2024
13 min read
EC2 Image Builder now supports building and testing macOS imagesCloudPro #70: Building Lightweight Kubernetes Dev Ephemeral EnvironmentsOur Exclusive 2-for-1 Sale is LIVE!For the next 24 hours only, you can secure 2 seats for the price of 1 at Generative AI in Action (Nov 11-13)!📅 Sale ends tomorrow at 10 AM ETBring a colleague, friend, or your team and dive into everything this conference has to offer—from expert insights and hands-on sessions to valuable networking opportunities.Act now. This deal won’t last long!⏳Today we will talk about:⭐MasterclassBuilding Lightweight Kubernetes Dev Ephemeral EnvironmentsFrom which Kubernetes pod (and namespace!) is this process that I see on my host?Argo Workflows: Simplify parallel jobs: Container-native workflow engine for KubernetesUsing SimKube 1.0: Comparing Kubernetes Cluster Autoscaler and KarpenterI've joined a company that has an AKS cluster whose version is completely outdated (1.21). I need to upgrade it to version 1.30 without any downtime and have a rollback plan in place🔍Secret KnowledgeLike Heroku, but You Own ItMulti-Metric ScalingGoran Opacic on X: "After years of using @awscloud Aurora, we are moving back to dedicated hardware. MySQL K8s operators are great, storage is cheap, memory is cheap, cpu is cheap, I can run 5.7 as much as I like and no AI. I'll miss database cloning and instant read replicasPolicy as Code in TerraformBehind the scenes of the OpenTelemetry Governance Committee⚡TechwaveEC2 Image Builder now supports building and testing macOS imagesUpgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon BedrockGrafana 11.3 release: Scenes-powered dashboards, visualization and panel updates, and moreSonar Details OpenAPI Generator Flaw That Creates Source Code VulnerabilityHashiCorp Updates Terraform; Wider Cloud Infrastructure Developer Toolsets🛠️Hackhubkubectl-guard: Accidentally modifying production instead of a local cluster? kubectl-guard helps prevent such critical mistakes.kubesafe: Safely manage multiple Kubernetes clusters by defining safe contexts and protected commands.Tfreveal:An open-source tool that enhances Terraform plan visibility by showing all resource and output differences, including sensitive values.SyncLite:A low-code platform for relational data consolidation, ideal for building data-intensive apps across edge, desktop, and mobile environments.pg_replicateCheers,Shreyans SinghEditor-in-Chief⭐MasterClass: Tutorials & GuidesBuilding Lightweight Kubernetes Dev Ephemeral EnvironmentsKardinal is an open-source tool for creating lightweight, temporary development environments on Kubernetes clusters. It’s designed to minimize resource usage by deploying only the services you need for testing while reusing existing resources when possible. Kardinal introduces “flows”—ephemeral environments that can be spun up for specific features or testing needs, which saves time and costs by avoiding redundant deployments.From which Kubernetes pod (and namespace!) is this process that I see on my host?To find which Kubernetes pod and namespace a process on your host belongs to, you can use crictl along with cgroups. First, get the process ID (PID) of the containerized process, then find its cgroup ID, which will contain the container’s unique identifier. Once you have that ID, use crictl inspect with a formatted output to get the pod’s namespace and name. This retrieves both the namespace and pod name directly from crictl using go-template formatting.Argo Workflows: Simplify parallel jobs: Container-native workflow engine for KubernetesIn this guide, the focus is on Argo Workflows, an open-source tool designed to manage complex workflows in Kubernetes environments by orchestrating parallel tasks in containers. Each step of a workflow is run within a container, making it ideal for complex pipelines like data processing or machine learning. Argo Workflows integrates with Kubernetes services (e.g., volumes, secrets, and RBAC) and uses Directed Acyclic Graphs (DAGs) to sequence tasks. This setup explains deploying Argo on Amazon EKS and integrating it with Argo Events to handle data-driven tasks triggered by messages from Amazon SQS, creating a scalable, event-driven Spark job processing platform on Kubernetes.Using SimKube 1.0: Comparing Kubernetes Cluster Autoscaler and KarpenterSimKube 1.0, a Kubernetes simulator, was used to test two popular cluster autoscaling solutions: Kubernetes Cluster Autoscaler (KCA) and Karpenter. Both tools add nodes to a Kubernetes cluster based on workload demands, but they differ significantly in approach. KCA, originally designed for homogeneous clusters, must be configured with specific instance types, which can make it slower when there are many options. Conversely, Karpenter, designed by AWS, optimizes across all available EC2 instances by default and uses both a "fast" loop for quick scheduling and a "slow" loop for optimization, which made it faster in this simulation.I've joined a company that has an AKS cluster whose version is completely outdated (1.21). I need to upgrade it to version 1.30 without any downtime and have a rollback plan in placeUpgrading an outdated AKS cluster from version 1.21 to 1.30 without downtime requires a careful approach, especially since rolling back AKS upgrades isn't possible. A Blue-Green deployment is a good option here but is complex at the cluster level. One way to approach it is to create a new cluster with AKS version 1.30, deploy and test the application there, and then redirect production traffic to the new cluster via DNS or load balancer once confirmed stable. First, validate the application’s compatibility with version 1.30 in your QA environment and ensure no critical API changes break functionality. If creating a new cluster is challenging due to resource limitations, consider a controlled maintenance window with a staged upgrade (e.g., from 1.21 to 1.22, then to 1.24, and so on) but remember that the direct upgrade might carry risks due to skipped deprecation changes and other breaking updates.🔍Secret Knowledge: Learning ResourcesLike Heroku, but You Own ItDokku is an open-source platform as a service (PaaS) that lets you turn a virtual private server (VPS) into a serverless platform, similar to Heroku, but with more control and no subscription costs. It allows easy deployment of web apps using Docker containers, GitHub Actions, or simple git commands. With features like auto-scaling, built-in SSL from Let’s Encrypt, and password protection, Dokku is ideal for hosting both applications and static sites from private repositories. Additionally, it offers flexible deployment options and can integrate with Cloudflare for HTTPS if needed, making it a powerful, budget-friendly solution for personal or small-scale app hosting.Multi-Metric ScalingYelp has implemented multi-metric autoscaling on its PaaSTA platform, enabling services to scale based on multiple factors (like CPU and request load) rather than just one, improving stability and quicker recovery during high-demand periods. Since PaaSTA is an 11-year-old platform on Kubernetes, updating it safely was challenging. The team spent weeks understanding the codebase, gathering input, and defining a clear, gradual update plan. They used snapshot testing and strict validation to confirm stability at each step, made minimal yet crucial API adjustments, and improved monitoring through Grafana. Ultimately, the update rolled out smoothly, enhancing scaling options without causing any service interruptions.Goran Opacic on X: "After years of using @awscloud Aurora, we are moving back to dedicated hardware. MySQL K8s operators are great, storage is cheap, memory is cheap, cpu is cheap, I can run 5.7 as much as I like and no AI. I'll miss database cloning and instant read replicasPolicy as Code in TerraformPolicy as Code (PaC) allows organizations to enforce rules and guidelines on infrastructure automatically by defining policies as code, ensuring resources meet security, compliance, and operational standards. Tools like HashiCorp Sentinel and Open Policy Agent (OPA) are popular frameworks for PaC, working with infrastructure as code (IaC) tools like Terraform. Unlike traditional IaC, which configures infrastructure, PaC sets up policy rules that are enforced whenever infrastructure changes are proposed. This approach helps maintain a secure, compliant cloud environment by preventing risky configurations.Behind the scenes of the OpenTelemetry Governance CommitteeThe OpenTelemetry Governance Committee (GC) guides the OpenTelemetry project strategically, ensuring its growth as a vendor-neutral observability framework. While the Technical Committee (TC) focuses on technical aspects, the GC's role includes setting project goals, updating policies, and overseeing SIG (Special Interest Group) sponsorships, ensuring alignment with community needs. GC members also represent OpenTelemetry at events, mediate conflicts, and check in with SIG maintainers to address challenges and gather feedback.⚡TechWave: Cloud News & AnalysisEC2 Image Builder now supports building and testing macOS imagesAWS EC2 Image Builder now supports creating macOS images, enabling users to streamline their image management and automate the creation of "golden images" (customized bootable OS images) for macOS in addition to Windows and Linux. This is particularly helpful for developers using macOS tools like Xcode and Fastlane, which are essential in CI/CD pipelines. With Image Builder, users can create components for specific tools, define a recipe for a base macOS image, configure infrastructure (like EC2 Mac Dedicated Hosts), and set up pipelines that automatically test and validate each image.Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon BedrockAnthropic's latest updates to the Claude 3.5 model family in Amazon Bedrock include an upgraded Claude 3.5 Sonnet, which enhances the model’s ability to handle complex software engineering tasks, knowledge-based Q&A, data extraction, and task automation at the same cost as previous versions. Additionally, a new "computer use" feature, available in public beta, allows Claude 3.5 Sonnet to interact with computer interfaces, like opening applications, typing, and clicking, opening up possibilities for AI-driven automation in software testing and administrative workflows. Lastly, the upcoming Claude 3.5 Haiku will offer faster response times paired with strong reasoning abilities, ideal for applications requiring both speed and intelligence, such as customer service and data processing in sectors like finance and healthcare.Grafana 11.3 release: Scenes-powered dashboards, visualization and panel updates, and moreGrafana 11.3 introduces a range of new features and improvements, with a highlight on the new Scenes-powered dashboards, enhancing stability, flexibility, and organization of dashboard elements. This release also includes visual and functional updates, like a redesigned inspect feature for table cells, enabling quick data analysis, and the new "Actions" option, allowing users to trigger API calls directly from elements on canvas panels. The update further enhances alerting with simplified rule creation and RBAC for notifications, and Explore Logs is now a default feature, making log troubleshooting more accessible.Sonar Details OpenAPI Generator Flaw That Creates Source Code VulnerabilitySonar recently identified a vulnerability in the OpenAPI Generator, a popular tool for creating API libraries, that could allow attackers to read or delete files in certain directories. Although a patch has been released, many existing APIs built with older, unpatched versions might still be at risk, requiring DevSecOps teams to locate and update them. This vulnerability underscores the challenge of detecting security flaws in auto-generated code, where developers may be less involved in the underlying code creation process. With cybercriminals actively searching for such vulnerabilities, DevSecOps teams must prioritize remediating high-risk code while balancing limited resources.HashiCorp Updates Terraform; Wider Cloud Infrastructure Developer ToolsetsHashiCorp, now under IBM's ownership, announced significant updates to Terraform at HashiConf, focusing on streamlining multi-cloud infrastructure management. Terraform's new "stacks" feature allows developers to manage complex, interdependent infrastructure configurations, making it easier to scale and control cloud resources across multiple environments. Additionally, HCP Waypoint provides a structured portal for internal development, using templates to standardize application deployment and updates. Other enhancements include new lifecycle management capabilities for HCP Vault, GPU resource sharing in Nomad, and an automation tool for migrating Terraform workflows, all designed to optimize and automate infrastructure in an increasingly complex cloud landscape.🛠️HackHub: Best Tools for Cloudkubectl-guard: Accidentally modifying production instead of a local cluster? kubectl-guard helps prevent such critical mistakes.To set up *kubectl-guard*, first create a file named *kubectl-guard* for the script, then make it executable by running `chmod +x kubectl-guard`. Next, open your shell configuration file (e.g., `~/.zshrc`) in a text editor, and add an alias with the command `alias kubectl='full-path-to/kubectl-guard'`, replacing "full-path-to" with the actual path where the script is saved. Save and close the file, then restart your terminal session for changes to take effect. This setup will help ensure safety by requiring the production cluster name to include "prod," though you can adjust this by modifying the `PROD_IDENTIFIER` variable.kubesafe: Safely manage multiple Kubernetes clusters by defining safe contexts and protected commands.*Kubesafe* is a tool designed to help you avoid running risky commands on the wrong Kubernetes cluster by marking certain contexts as "safe" and defining commands that need confirmation before execution. It works with any Kubernetes CLI tool (like `kubectl` or `helm`) by wrapping the command to add this layer of protection. For instance, running `kubesafe kubectl delete pod my-pod` will prompt for confirmation if the context is marked as protected. You can set up aliases, such as `alias kubectl='kubesafe kubectl'`, to automatically use Kubesafe each time you run a command.Tfreveal:An open-source tool that enhances Terraform plan visibility by showing all resource and output differences, including sensitive values.*tfreveal* is an open-source tool that lets you see all changes, including sensitive values, in Terraform plan files, enhancing transparency in infrastructure updates. While Terraform hides sensitive data by default, tfreveal unearths these details, which is particularly useful for detecting drift between Terraform state and actual infrastructure. Typically, sensitive data can only be viewed through complex JSON outputs, making it hard to read, especially when changes are in large encoded values. tfreveal simplifies this by displaying clear diffs, showing all values. To use, generate a plan file with `terraform plan -out plan.out`, then pipe it to tfreveal via `terraform show -json plan.out | tfreveal`.SyncLite:A low-code platform for relational data consolidation, ideal for building data-intensive apps across edge, desktop, and mobile environments.SyncLite is an open-source, low-code platform for creating data-intensive applications that seamlessly consolidate and synchronize data across edge, desktop, and mobile environments. It supports real-time, transactional data replication from various sources, like embedded databases (e.g., SQLite, DuckDB) and IoT message brokers, and integrates with popular data destinations, such as databases, data warehouses, and data lakes.pg_replicate`pg_replicate` is a Rust library designed to help developers quickly set up data replication from PostgreSQL to various data systems. It simplifies the use of PostgreSQL’s logical streaming replication protocol, letting users focus on building data pipelines without dealing with protocol details. To get started, users create a PostgreSQL publication, run the stdout example to replicate data to standard output, and connect using simple commands.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 10414

Shreyans from Packt
15 Nov 2024
7 min read
Save for later

Unlock Kubernetes Savings with Kubecost’s Automated Actions

Shreyans from Packt
15 Nov 2024
7 min read
Red Hat Enterprise Linux AI Now Generally AvailableCloudPro #73: Unlock Kubernetes Savings with Kubecost’s Automated ActionsShouldn't GenAI be doing all the cyber crap jobs by now?Learn about the latest in GenAI for vulnerability management, exposure management and cyber-asset security when you attend the CyberRisk Summit. This free, virtual event on Wednesday, Nov. 20 includes expert speakers from Yahoo, Wells Fargo, IBM, Vulcan Cyber and more. This is the ninth, semi-annual CyberRisk Summit. Attendees can request CPE credits, and all registrants get access to the session recordings. Join us!Register for free⭐MasterclassThe Kubernetes gap in CNAPPUnlock Kubernetes Savings with Kubecost’s Automated ActionsHow WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondHow to migrate an observability platform to open-source and cut costs🔍Secret KnowledgeImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureComplete Guide to Logging in Golang with slogScaling Prometheus with ThanosAutomated container CVE and vulnerability patching using Trivy and CopaceticSelf-signed Root CA in Kubernetes with k3s, cert-manager and traefik🛠️HackhubProduction-ready Kubernetes distribution for both public and private cloudApplication Performance Monitoring SystemGraceful shutdown and Kubernetes readiness / liveness checks for any Node.js HTTP applicationsToolkit for Integrating with your kubernetes dev environment more efficientlyBackup your Kubernetes Stateful ApplicationsCheers,Shreyans SinghEditor-in-ChiefREGISTER FOR FREEProtect Your .NET Applications with Dotfuscator: Stop Reverse Engineering and Secure Your IPYour .NET applications face constant threats from reverse engineering, leaving your proprietary code, sensitive logic, and IP exposed. But with Dotfuscator by PreEmptive, you can safeguard your software. Dotfuscator’s advanced obfuscation features—like renaming, control flow obfuscation, and string encryption—harden your code against tampering, unauthorized access, and IP theft.Take control of your application’s security and keep your code and intellectual property secure. Empower your development process with Dotfuscator today—because your .NET apps deserve protection that lasts.Start Free Trial⭐MasterClass: Tutorials & GuidesThe Kubernetes gap in CNAPPInitially, CNAPPs focused on integrating various cloud security tools and supporting enterprises during early cloud adoption. As a result, their Kubernetes protection often lacks depth and focuses mainly on surface-level issues like container vulnerabilities, without addressing the complexities of Kubernetes clusters, such as control plane security or runtime policies. This has led to a false sense of security in cloud environments, as CNAPPs fail to offer robust Kubernetes-specific features.Unlock Kubernetes Savings with Kubecost’s Automated ActionsKubecost's new automated actions help users save money in their Kubernetes environments by optimizing resource usage with minimal effort. With features like automated request sizing, cluster turndown, and namespace turndown, Kubecost identifies inefficiencies like over-provisioned containers and shuts down unused clusters or namespaces. Users can set schedules for automating these actions, reducing waste and freeing up resources.How WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondWebAssembly (Wasm) components enable Kubernetes to extend seamlessly across multi-cloud, edge, and other distributed environments by providing a lightweight, portable way to run applications across any architecture. Wasm components, similar to containers, can be written in various languages and connected through shared APIs, allowing for greater flexibility and efficiency. By integrating with Kubernetes through wasmCloud, a Wasm-native orchestrator, organizations can enhance their cloud-native setups without changing existing infrastructure.How to migrate an observability platform to open-source and cut costsMigrating an observability platform to open-source can significantly reduce costs while maintaining control over telemetry data, but it requires careful planning and execution. This process involves identifying essential telemetry data, selecting an open-source stack for logs, metrics, and traces, conducting proofs-of-concept (POCs) across different systems, and ensuring compatibility with various architectures, such as microservices. The migration also includes reconfiguring alerts and dashboards, validating the new setup, and updating related systems like notification and incident management tools.🔍Secret Knowledge: Learning ResourcesImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureThis book provides practical guidance on using GitOps to automate and manage Kubernetes deployments in cloud-native environments like AWS and Azure. It explains core GitOps principles, tools like Argo CD and Flux, and strategies for implementing CI/CD pipelines. The book also covers infrastructure automation with Terraform, security best practices, and observability while addressing cultural transformations in IT for GitOps adoption. By the end, readers will have skills to apply GitOps in scaling, monitoring, and securing Kubernetes deployments efficiently.Complete Guide to Logging in Golang with slogIn Golang, structured logging can be efficiently implemented using the `slog` package, introduced in version 1.21. `slog` allows for more organized and detailed log entries by formatting logs as key-value pairs, making them easier to search, filter, and analyze. The package provides flexibility with logging levels (like Debug, Info, Warn, and Error) and supports both text-based and JSON-formatted output. Key components include Loggers, Records, and Handlers, which define how logs are created, stored, and processed.Scaling Prometheus with ThanosScaling Prometheus with Thanos allows for long-term storage, cost savings, and a global view of metrics in large environments. While Prometheus is great for short-term monitoring, it struggles with long-term storage and querying across multiple clusters. Thanos extends Prometheus by using components like Thanos Query, Sidecar, and Store Gateway to enable scalable, highly available storage through object stores, reducing Prometheus's resource consumption. It also supports downsampling to optimize storage and query performance.Automated container CVE and vulnerability patching using Trivy and CopaceticAutomating container vulnerability patching with Trivy and Copacetic (copa) helps protect your applications from potential attacks by scanning and patching container images automatically. Trivy scans container images for vulnerabilities, generating a report in JSON format, while Copacetic reads this report and patches the container image based on detected vulnerabilities. Once patched, the image is rebuilt and rescanned to ensure all vulnerabilities have been fixed.Self-signed Root CA in Kubernetes with k3s, cert-manager and traefikIn Kubernetes with k3s, cert-manager, and Traefik, you can create a self-signed root Certificate Authority (CA) to manage TLS certificates locally, useful when your cluster isn't exposed to the internet (e.g., no Let's Encrypt). The process involves setting up cert-manager to automate the issuance, renewal, and secret management of these certificates. You first create a self-signed root CA, which then signs an intermediate CA, and that intermediate CA signs leaf certificates for your services. This setup allows your services to have trusted certificates locally.🛠️HackHub: Best Tools for Cloudlabring/sealosSealos is a cloud operating system built on the Kubernetes kernel, designed to simplify managing cloud-native applications. It offers quick deployment of distributed applications and high-availability databases like MySQL, PostgreSQL, and MongoDB.apache/skywalkingApache SkyWalking is an open-source Application Performance Monitoring (APM) system designed for microservices, cloud-native, and container-based architectures. It offers end-to-end distributed tracing, service observability, and diagnostic tools, supporting various programming languages like Java, .NET, PHP, and Python.godaddy/terminusTerminus is a Node.js package that helps manage graceful shutdowns and Kubernetes health checks for HTTP applications. Terminus also provides readiness and liveness checks to inform Kubernetes about the service’s health status.alibaba/kt-connectKT-Connect is a tool that helps developers efficiently connect, redirect, and expose local applications to Kubernetes clusters for easier testing and development.stashed/stashStash by AppsCode is a cloud-native backup and recovery solution for Kubernetes workloads, making it easier to back up and restore data like volumes and databases in dynamic Kubernetes environments. It simplifies the backup process using tools like restic and Kubernetes CSI Driver VolumeSnapshotter.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0
  • 10192

Shreyans from Packt
06 Sep 2024
9 min read
Save for later

Google Cloud has launched Memorystore for Valkey

Shreyans from Packt
06 Sep 2024
9 min read
Red Hat Enterprise Linux AI Now Generally AvailableCloudPro #63: Google Cloud has launched Memorystore for Valkey200+ hours of research on AI-led career growth strategies & hacks packed in 3 hoursThe only AI Crash Course you need to master 20+ AI tools, multiple hacks & prompting techniques in just 3 hoursYou’ll save 16 hours every week & find remote jobs using AI that will pay you upto $10,000/moGet It Here⭐Masterclass[Sponsored] 200+ hours of research on AI-led career growth strategies & hacks packed in 3 hoursThe Kubernetes gap in CNAPPUnlock Kubernetes Savings with Kubecost’s Automated ActionsHow WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondHow to migrate an observability platform to open-source and cut costs🔍Secret KnowledgeImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureComplete Guide to Logging in Golang with slogScaling Prometheus with ThanosAutomated container CVE and vulnerability patching using Trivy and CopaceticSelf-signed Root CA in Kubernetes with k3s, cert-manager and traefik⚡TechwaveRed Hat Enterprise Linux AI Now Generally AvailableKubernetes 1.31: Streaming Transitions from SPDY to WebSocketsGoogle Cloud has launched Memorystore for ValkeyPalo Alto Networks acquires IBM QRadar SaaS assetsBroadcom Adds On-Premises Edition of Project Management Application🛠️HackhubProduction-ready Kubernetes distribution for both public and private cloudApplication Performance Monitoring SystemGraceful shutdown and Kubernetes readiness / liveness checks for any Node.js HTTP applicationsToolkit for Integrating with your kubernetes dev environment more efficientlyBackup your Kubernetes Stateful ApplicationsCheers,Shreyans SinghEditor-in-ChiefLive Webinar: The Power of Data Storytelling in Driving Business Decisions (September 10, 2024 at 9 AM CST)Data doesn’t have to be overwhelming. Join our webinar to learn about Data Storytelling and turn complex information into actionable insights for faster decision-making.Click below to check the schedule in your time zone and secure your spot. Can't make it? Register to get the recording instead.REGISTER FOR FREEForward to a Friend⭐MasterClass: Tutorials & GuidesThe Kubernetes gap in CNAPPInitially, CNAPPs focused on integrating various cloud security tools and supporting enterprises during early cloud adoption. As a result, their Kubernetes protection often lacks depth and focuses mainly on surface-level issues like container vulnerabilities, without addressing the complexities of Kubernetes clusters, such as control plane security or runtime policies. This has led to a false sense of security in cloud environments, as CNAPPs fail to offer robust Kubernetes-specific features.Unlock Kubernetes Savings with Kubecost’s Automated ActionsKubecost's new automated actions help users save money in their Kubernetes environments by optimizing resource usage with minimal effort. With features like automated request sizing, cluster turndown, and namespace turndown, Kubecost identifies inefficiencies like over-provisioned containers and shuts down unused clusters or namespaces. Users can set schedules for automating these actions, reducing waste and freeing up resources.How WebAssembly components extend the frontiers of Kubernetes to multi-cloud, edge, and beyondWebAssembly (Wasm) components enable Kubernetes to extend seamlessly across multi-cloud, edge, and other distributed environments by providing a lightweight, portable way to run applications across any architecture. Wasm components, similar to containers, can be written in various languages and connected through shared APIs, allowing for greater flexibility and efficiency. By integrating with Kubernetes through wasmCloud, a Wasm-native orchestrator, organizations can enhance their cloud-native setups without changing existing infrastructure.How to migrate an observability platform to open-source and cut costsMigrating an observability platform to open-source can significantly reduce costs while maintaining control over telemetry data, but it requires careful planning and execution. This process involves identifying essential telemetry data, selecting an open-source stack for logs, metrics, and traces, conducting proofs-of-concept (POCs) across different systems, and ensuring compatibility with various architectures, such as microservices. The migration also includes reconfiguring alerts and dashboards, validating the new setup, and updating related systems like notification and incident management tools.🔍Secret Knowledge: Learning ResourcesImplementing GitOps with Kubernetes: Automate, manage, scale, and secure infrastructure and cloud-native applications on AWS and AzureThis book provides practical guidance on using GitOps to automate and manage Kubernetes deployments in cloud-native environments like AWS and Azure. It explains core GitOps principles, tools like Argo CD and Flux, and strategies for implementing CI/CD pipelines. The book also covers infrastructure automation with Terraform, security best practices, and observability while addressing cultural transformations in IT for GitOps adoption. By the end, readers will have skills to apply GitOps in scaling, monitoring, and securing Kubernetes deployments efficiently.Complete Guide to Logging in Golang with slogIn Golang, structured logging can be efficiently implemented using the `slog` package, introduced in version 1.21. `slog` allows for more organized and detailed log entries by formatting logs as key-value pairs, making them easier to search, filter, and analyze. The package provides flexibility with logging levels (like Debug, Info, Warn, and Error) and supports both text-based and JSON-formatted output. Key components include Loggers, Records, and Handlers, which define how logs are created, stored, and processed.Scaling Prometheus with ThanosScaling Prometheus with Thanos allows for long-term storage, cost savings, and a global view of metrics in large environments. While Prometheus is great for short-term monitoring, it struggles with long-term storage and querying across multiple clusters. Thanos extends Prometheus by using components like Thanos Query, Sidecar, and Store Gateway to enable scalable, highly available storage through object stores, reducing Prometheus's resource consumption. It also supports downsampling to optimize storage and query performance.Automated container CVE and vulnerability patching using Trivy and CopaceticAutomating container vulnerability patching with Trivy and Copacetic (copa) helps protect your applications from potential attacks by scanning and patching container images automatically. Trivy scans container images for vulnerabilities, generating a report in JSON format, while Copacetic reads this report and patches the container image based on detected vulnerabilities. Once patched, the image is rebuilt and rescanned to ensure all vulnerabilities have been fixed.Self-signed Root CA in Kubernetes with k3s, cert-manager and traefikIn Kubernetes with k3s, cert-manager, and Traefik, you can create a self-signed root Certificate Authority (CA) to manage TLS certificates locally, useful when your cluster isn't exposed to the internet (e.g., no Let's Encrypt). The process involves setting up cert-manager to automate the issuance, renewal, and secret management of these certificates. You first create a self-signed root CA, which then signs an intermediate CA, and that intermediate CA signs leaf certificates for your services. This setup allows your services to have trusted certificates locally.Developing for iOS? Setapp's 2024 report on the state of the iOS market in the EU is a must-seeHow do users in the EU find apps? What's the main source of information about new apps? Would users install your app from a third-party app marketplace?Set yourself up for success with these and more valuable marketing insights in Setapp Mobile's report iOS Market Insights for EU.Get Insights free⚡TechWave: Cloud News & AnalysisRed Hat Enterprise Linux AI Now Generally AvailableRed Hat Enterprise Linux (RHEL) AI is now available, providing an open-source platform for developing and running generative AI models across hybrid cloud environments. It combines efficient models, such as the Granite LLM family, and tools like InstructLab to help align models with specific business needs. RHEL AI allows domain experts, not just data scientists, to contribute to AI models, making them more accessible and cost-effective.Kubernetes 1.31: Streaming Transitions from SPDY to WebSocketsIn Kubernetes 1.31, the default streaming protocol used by kubectl has shifted from the outdated SPDY protocol to the more modern and widely supported WebSocket protocol. Streaming protocols in Kubernetes enable persistent, real-time communication between the client and server, which is useful for operations like running commands inside a container. The switch to WebSockets improves compatibility with modern proxies and gateways, ensuring commands like `kubectl exec`, `kubectl cp`, and `kubectl port-forward` function smoothly across different environments.Google Cloud has launched Memorystore for ValkeyGoogle Cloud has launched Memorystore for Valkey, a fully managed, high-performance key-value service that is 100% open-source. Valkey 7.2 is compatible with Redis 7.2 and offers features like zero-downtime scaling, persistence, and integration with Google Cloud. It's designed to meet the demand for open-source data management, providing users with an alternative to Redis for use cases like caching and session management. Valkey is gaining popularity due to its performance and scalability, and Google Cloud plans to expand its capabilities further with Valkey 8.0, which promises even better performance and reliability.Palo Alto Networks acquires IBM QRadar SaaS assetsPalo Alto Networks has acquired IBM's QRadar SaaS assets to enhance their joint AI-powered security solutions, aiming to help organizations strengthen their cybersecurity operations. This partnership will simplify threat detection, improve security automation, and deliver next-generation security operations at scale. IBM will support seamless migrations to Palo Alto's Cortex XSIAM platform.Broadcom Adds On-Premises Edition of Project Management ApplicationAt VMware Explore 2024, Broadcom introduced an on-premises version of its Rally project management application, called Rally Anywhere, to give organizations more control over their data. This version is especially valuable for industries with strict regulations or concerns about ransomware targeting SaaS platforms. Rally Anywhere offers an alternative to Atlassian’s Jira, which is discontinuing its on-premises option, and helps organizations meet data sovereignty requirements.🛠️HackHub: Best Tools for Cloudlabring/sealosSealos is a cloud operating system built on the Kubernetes kernel, designed to simplify managing cloud-native applications. It offers quick deployment of distributed applications and high-availability databases like MySQL, PostgreSQL, and MongoDB.apache/skywalkingApache SkyWalking is an open-source Application Performance Monitoring (APM) system designed for microservices, cloud-native, and container-based architectures. It offers end-to-end distributed tracing, service observability, and diagnostic tools, supporting various programming languages like Java, .NET, PHP, and Python.godaddy/terminusTerminus is a Node.js package that helps manage graceful shutdowns and Kubernetes health checks for HTTP applications. Terminus also provides readiness and liveness checks to inform Kubernetes about the service’s health status.alibaba/kt-connectKT-Connect is a tool that helps developers efficiently connect, redirect, and expose local applications to Kubernetes clusters for easier testing and development.stashed/stashStash by AppsCode is a cloud-native backup and recovery solution for Kubernetes workloads, making it easier to back up and restore data like volumes and databases in dynamic Kubernetes environments. It simplifies the backup process using tools like restic and Kubernetes CSI Driver VolumeSnapshotter.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 9811

Shreyans from Packt
18 Oct 2024
11 min read
Save for later

AI agents invade observability: snake oil or the future of SRE?

Shreyans from Packt
18 Oct 2024
11 min read
I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, AccentureCloudPro #69: AI agents invade observabilityJoinGenerativeAI InActionnow withaFull Event Pass for just $239.99—40% off the regular price—with codeFLASH40.BOOK TODAY AT $239.99 $399.99Three Reasons Why You Cannot Miss This Event:-Network with 25+ Leading AI Experts-Gain Insights from 30+ Dynamic Talks and Hands-On Sessions-Engage with Experts and Peers through 1:1 Networking, Roundtables, and AMAsAct fast—this FLASH SALE is only for a limited number of seats!CLAIM NOW- LIMITED SEATSToday we will talk about:⭐MasterclassAI agents invade observability: snake oil or the future of SRE?I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, Accenture, and othersQA's Dead: Where Do We Go From Here?Convert OpenTelemetry Traces to Metrics using SpanMetrics ConnectorReduce Network Traffic Costs in Your Kubernetes Cluster🔍Secret KnowledgeSQLite on RailsJust use PostgresWhy I still Self-Host my ServersEssays on programming I think about a lotA detailed guide to cron jobs⚡TechwaveHow Google fine-tuned Gemma model for FlipkartAWS has launched Console to Code: tool that generates codeBring your conversations to WhatsApp with AWS End User Messaging SocialIntroducing pipe syntax in BigQuery and Cloud LoggingGCloud Database Center: AI-powered, unified fleet management solution preview now open to all customers🛠️Hackhubagnost-gitops: Open source GitOps platform running on Kubernetes clusterskube-downscaler: Scale down Kubernetes deployments after work hoursAWS Mine: honey token system designed to generate AWS access keysTinyStatus:A simple, customizable status page generator that monitors and displays the status of services on a responsive web page.Litecli:A command-line client for SQLite databases, featuring auto-completion and syntax highlighting.Cheers,Shreyans SinghEditor-in-ChiefLooking to build, train, deploy, or implement Generative AI?Meet Innodata — offering high-quality solutions for developing and implementing industry-leading generative AI, including:With 5,000+ in-house SMEs and expansion and localization supported across 85+ languages, Innodata drives AI initiatives for enterprises globally.Learn More⭐MasterClass: Tutorials & GuidesAI agents invade observability: snake oil or the future of SRE?This article explores how AI, particularly agentic AI, is transforming the field of observability and monitoring. Traditional monitoring tools use dashboards, alerts, and data insights to help developers and operators manage system health, but new AI agents are designed to act more like team members. These agents, powered by large language models (LLMs), can analyze operational data and automate tasks like incident response and maintenance.I created DevOps Interview Preparation Lab based on Interviews from Microsoft, Airbnb, Accenture, and othersThis hands-on lab is designed to help you prepare for DevOps interviews by walking you through key tools like Python web apps, Docker, Kubernetes, Helm Charts, GitHub Actions for CI/CD, and Ingress Controllers. It's practical, not theory-based, and helps you build a project from scratch through containerization, deployment, and CI/CD setup.QA's Dead: Where Do We Go From Here?The concept of traditional QA (Quality Assurance) has evolved, shifting responsibility for software quality from a separate QA team to developers themselves. In the old model, QA was a distinct stage that came after development, causing delays, inefficiencies, and higher costs due to late bug detection. Now, with agile methodologies and advanced tooling, testing is integrated throughout the development process. Developers take ownership of quality, using tools like automated testing, CI/CD pipelines, and instant feedback mechanisms. QA isn't dead; instead, it has become an essential part of every developer's role, with QA professionals either moving into technical automation roles or higher-level strategic positions.Convert OpenTelemetry Traces to Metrics using SpanMetrics ConnectorThe SpanMetrics Connector in OpenTelemetry converts trace data into actionable metrics, which is useful when robust tracing is in place but metrics instrumentation is lacking. It works by extracting metrics from spans (units of trace data) and aggregating them into key performance indicators like request counts, errors, and durations. This unified approach simplifies observability by reducing the need for separate instrumentation for traces and metrics. By configuring the connector, developers can easily generate custom metrics, optimize system performance, and enhance monitoring without increasing overhead or complexity.Reduce Network Traffic Costs in Your Kubernetes ClusterTo reduce network traffic costs in a Kubernetes cluster, it's important to minimize cross-availability zone (AZ) traffic, which can increase latency and lead to higher data transfer costs. Strategies to reduce this include intelligent node placement, ensuring related pods are located in the same AZ to avoid unnecessary data transfer. Topology-aware routing ensures traffic is directed within the same AZ, while using local persistent volumes keeps data close to the pods accessing it. Pod topology spread constraints help evenly distribute pods across zones, further minimizing cross-AZ communication and improving both performance and cost-efficiency.🔍Secret Knowledge: Learning ResourcesSQLite on RailsRunning SQLite on Rails can provide good performance, but out-of-the-box it isn’t optimized for high-concurrency production environments. This is mainly due to SQLite’s single-write locking mechanism, which can cause errors and bottlenecks when multiple threads attempt to write at the same time. However, by fine-tuning configurations—like setting immediate transactions, adjusting busy timeouts, and managing connection pools—Rails apps can achieve resilient performance. Advanced techniques, such as using custom busy handlers and write-ahead logging (WAL), further enhance concurrency and minimize delays, making SQLite on Rails a viable production option.Just use PostgresWhen building a new application requiring persistent storage, Postgres should be your default choice. It highlights why other databases might not be ideal: SQLite is great for single-machine apps but limited for distributed systems, NoSQL databases like MongoDB require rigid access patterns, and newer databases like XTDB pose long-term risks. Postgres offers flexibility, scalability, and a rich ecosystem of tools, making it a reliable and efficient choice for most web applications without the trade-offs of other databases.Why I still Self-Host my ServersTwo reasons: independence and learning. Hosting own services lets the author stay free from corporate control and subscriptions while teaching valuable skills that benefit his career as a software engineer. From managing a Proxmox cluster and Pi-Hole DNS servers to troubleshooting outages and hardware issues, the experience forces him to dive deeper into the technical aspects of system administration. This continuous learning has proven useful in handling complex distributed systems at work. Despite the challenges, like hardware failures and occasional crashes, the lessons learned make it worthwhile.Essays on programming I think about a lotThis passage highlights several key programming essays that have deeply impacted the author's thinking and engineering approach. These essays cover various topics, from understanding complex systems, choosing stable technology, and managing abstractions, to hiring strong engineering teams and designing scalable distributed systems. The recurring theme is thoughtful, pragmatic decision-making in software engineering, advocating for simplicity, clear abstraction boundaries, and understanding the deeper layers of technology. Each essay provides timeless insights, shaping the author's work habits, and the list invites others to explore and reflect on these ideas for themselves.A detailed guide to cron jobsA cron job is a scheduled task or command in Unix-based systems, like Linux and macOS, that automates repetitive processes such as backups, email sending, or database updates. Cron jobs use a specific time-based syntax to determine when and how often the task should run. This guide explains how to set up, edit, and manage cron jobs, including the syntax, adding new jobs, and checking their logs. It also covers methods for monitoring cron jobs, such as using logs, monitoring tools, and email alerts to ensure tasks run as expected without system issues.⚡TechWave: Cloud News & AnalysisHow Google fine-tuned Gemma model for FlipkartThe blog describes the process of fine-tuning Gemma, an instruction-tuned AI model, for a conversational shopping assistant. It starts with data preparation using a subset of Flipkart’s product catalog, filtering for clothing items and generating Q&A pairs based on product details. Fine-tuning was achieved using LoRA, a parameter-efficient method, with multiple iterations on both pre-trained and instruction-tuned models. The fine-tuning was scaled using multi-GPU setups on Google Kubernetes Engine (GKE). Hyperparameter tuning was also crucial to optimize model performance, ensuring the chatbot provides accurate, contextual responses.AWS has launched Console to Code: tool that generates codeAWS has launched "Console to Code," a tool that simplifies the process of moving from prototyping in the AWS Management Console to writing production-ready code. This tool automatically captures actions taken in the console and generates code in formats like CLI, CloudFormation, and CDK, following AWS best practices. It helps users quickly create reusable, automation-friendly code without needing to manually write it, streamlining the transition from console use to Infrastructure-as-Code (IaC). This service is available for key AWS services like EC2, VPC, and RDS.Bring your conversations to WhatsApp with AWS End User Messaging SocialAWS has introduced "End User Messaging Social," allowing developers to send messages to their users on WhatsApp, the world’s most popular messaging app. With this tool, developers can create rich, interactive messaging experiences that include multimedia content. WhatsApp can now be used alongside SMS and Push notifications, giving businesses multiple ways to reach their audience. Setting up WhatsApp messaging is easy, with options to create a new WhatsApp Business Account or link an existing one, all within the AWS console.Introducing pipe syntax in BigQuery and Cloud LoggingGoogle Cloud has introduced a new "pipe syntax" in BigQuery and Cloud Logging, designed to simplify log data queries. This new syntax uses a pipe symbol (|>) to break down complex SQL queries into clear, easy-to-read steps, improving the readability and writability of log analysis tasks. With this innovation, users can quickly filter, aggregate, and explore log data, making it easier to extract insights. BigQuery’s enhanced performance features, like faster numeric search indexes and better handling of JSON data, further streamline log analysis. Pipe syntax is now available in preview.GCloud Database Center: AI-powered, unified fleet management solution preview now open to all customersGoogle Cloud has launched Database Center, an AI-powered solution that simplifies managing large, complex database fleets. It provides a unified interface for monitoring and optimizing databases like Cloud SQL, AlloyDB, and Spanner. Database Center helps businesses detect and address performance and security issues with proactive recommendations, ensuring smoother operations and better compliance with industry standards. It also includes AI-powered chat for quick troubleshooting and optimization insights, allowing users to improve performance, reduce costs, and strengthen security across their entire database landscape.🛠️HackHub: Best Tools for Cloudagnost-gitops: Open source GitOps platform running on Kubernetes clustersAgnost GitOps is an open-source platform for continuous deployment (CD) on Kubernetes clusters. It automates the process of building, deploying, and managing applications by connecting your GitHub, GitLab, or Bitbucket repository. When you push new code, Agnost builds a Docker image using Kaniko and deploys it to your Kubernetes cluster.kube-downscaler: Scale down Kubernetes deployments after work hoursKube-downscaler is a Kubernetes tool designed to automatically scale down or pause workloads (like Deployments, StatefulSets, and HorizontalPodAutoscalers) during non-work hours, helping organizations save on cloud costs. It operates based on a configurable schedule of uptime and downtime, using Kubernetes annotations or command-line options.AWS Mine: honey token system designed to generate AWS access keysThe "aws-mine" project is a honey token system designed to generate AWS access keys that can be strategically placed in various locations to lure and detect potential attackers. If someone attempts to use these keys, the system sends a notification within about four minutes, allowing you to investigate the source and assess whether the asset has been compromised.TinyStatus:A simple, customizable status page generator that monitors and displays the status of services on a responsive web page.It checks the status of HTTP endpoints, pings hosts, and monitors open ports, displaying results on a clean and responsive web page. The system is configured using YAML files, and it supports both light and dark themes, as well as incident history tracking.Litecli:A command-line client for SQLite databases, featuring auto-completion and syntax highlighting.Upon first use, LiteCLI generates a configuration file that can be customized for user preferences. It streamlines database interactions by predicting commands and formatting output, enhancing the command-line experience for SQLite users.📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 9230

Shreyans from Packt
30 Aug 2024
10 min read
Save for later

Kubernetes 1.31: Fine-grained SupplementalGroups control

Shreyans from Packt
30 Aug 2024
10 min read
Announcing Terraform Google Provider 6.0.0 CloudPro #62: Kubernetes 1.31 Fine-grained SupplementalGroups control Quick Start Kubernetes Understand what Kubernetes is and why it's essential Learn the inner workings of Kubernetes architecture Get hands-on with deploying and managing applications Set up Kubernetes and containerize applications GET IT FOR $18.99 $12.99 ⭐Masterclass: Unlock the Full Potential of Kubernetes for Scalable Application Management Kubernetes pod and container restarting Better Kubernetes YAML Editing with (Neo)vim Monitoring kubernetes events with kubectl and Grafana Loki Practical Logging for PHP Applications with OpenTelemetry Using 1Password with External Secrets Operator in a GitOps way 🔍Secret Knowledge: Build your own SQS or Kafka with Postgres Revealing the Inner Structure of AWS Session Tokens An Opinionated Ramp Up Guide to AWS Pentesting Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs Application Availability Depends on Dependencies ⚡Techwave: Kubernetes 1.31: Fine-grained SupplementalGroups control Announcing Terraform Google Provider 6.0.0 New capabilities in VMware Private AI Foundation with NVIDIA GitLab Announces the General Availability of GitLab Duo Enterprise Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more 🛠️HackHub: Best Tools for the Cloud PostgreSQL cloud native High Availability and more Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates Runs and manages databases, message queues, etc on K8s Powerful workflow engine and end-to-end pipeline solutions implemented with native Kubernetes resources configure kubernetes objects on multiple clusters using jsonnet Cheers, Shreyans Singh Editor-in-Chief Mobile Banking Apps: Secure SDKs Aren’t Enough (Webinar) Is your mobile banking app truly secure? Join our webinar to learn why relying solely on protected SDKs leaves your app vulnerable. Discover real-world scenarios where emerging vulnerabilities can compromise your app despite using a protected SDK. We'll cover multi-layered protection strategies and practical solutions to guard against reverse engineering, tampering, and malware. Gain actionable insights on using obfuscation, data encryption, and real-time application self-protection (RASP) to safeguard your app. Equip yourself with practical solutions to ensure comprehensive app security and safeguard your business from financial and regulatory risks. REGISTER NOW Forward to a Friend ⭐MasterClass: Tutorials & Guides Kubernetes pod and container restarting In Kubernetes, a Pod is the smallest deployable unit, often containing one or more containers. When a container or pod needs to be restarted due to errors or updates, Kubernetes offers several methods to do so. For example, you can restart a Pod by deleting it, and Kubernetes will automatically recreate it if it’s part of a Deployment. Alternatively, you can restart a specific container within a Pod using commands like `kubectl exec` for more precise control. These features allow Kubernetes to maintain high availability and resilience in a cloud environment. Better Kubernetes YAML Editing with (Neo)vim Editing Kubernetes YAML files can be tricky, but using Neovim, a modern version of Vim, can make it much easier. Neovim is lightweight, highly customizable, and integrates well with your terminal, making it ideal for DevOps and platform engineers. By configuring Neovim specifically for YAML files, you can set up features like auto-indentation, syntax highlighting, folding, and autocompletion, all of which help reduce errors and improve efficiency. Monitoring kubernetes events with kubectl and Grafana Loki In Kubernetes, monitoring events is crucial for understanding the status and issues related to Pods, WorkerNodes, and other components. You can use `kubectl` to view these events directly, or you can enhance your monitoring setup by integrating Kubernetes events with Grafana Loki. By capturing events as logs using a tool like the `k8s-event-logger`, which listens to the Kubernetes API, you can store them in Loki, create metrics with RecordingRules, and visualize them in Grafana. Practical Logging for PHP Applications with OpenTelemetry Practical logging for PHP applications using OpenTelemetry involves instrumenting your PHP code to collect and correlate log data with other observability signals like traces and metrics. This approach is particularly useful in microservices-based architectures, where understanding the interactions between different services is crucial for maintaining system stability. By using OpenTelemetry, developers can standardize how telemetry data is collected and exported, reducing complexity. Using 1Password with External Secrets Operator in a GitOps way To manage secrets securely in a GitOps environment using Kubernetes, you can integrate 1Password with the External Secrets Operator. This setup allows you to automatically fetch and inject secrets stored in 1Password into your Kubernetes cluster. By using tools like ArgoCD, Helm, or FluxCD, you can deploy and manage this integration efficiently. The External Secrets Operator pulls secrets from 1Password via 1Password Connect, a proxy that ensures availability and reduces API requests. PACKT TITLES FOR YOU Buy now at $16.99 $10.99 Buy now at $39.99 $27.98 Buy now at $24.99 $16.99 🔍Secret Knowledge: Learning Resources Build your own SQS or Kafka with Postgres You can build your own version of SQS (Simple Queue Service) or Kafka using PostgreSQL by setting up tables and queries that mimic the functionality of these popular message queues and streams. For SQS, you create a table to store messages, with columns that help manage message visibility, delivery attempts, and order. You can then write queries to insert messages, retrieve them while respecting visibility timeouts, and delete them after processing. For Kafka, you expand this setup by storing messages persistently and keeping track of where each consumer group is in the message stream, allowing multiple consumers to process messages independently and in parallel, similar to Kafka's partitioning system. Revealing the Inner Structure of AWS Session Tokens By reverse engineering these tokens, the research team developed tools to analyze and modify them programmatically. This allowed them to uncover previously unknown details about AWS's cryptography and authentication protocols. Their findings showed that while AWS's security measures are robust, understanding the structure of these tokens can help defenders better protect against potential attacks. Additionally, the research raises questions about the privacy and integrity of these tokens. An Opinionated Ramp Up Guide to AWS Pentesting) Lizzie Moratti's "Opinionated Ramp Up Guide to AWS Pentesting" offers a detailed roadmap for becoming proficient in AWS pentesting, emphasizing practical experience over certifications. The guide is tailored for those with a foundational understanding of networking and security, and it stresses the importance of broad knowledge before delving into deeper cloud-specific skills. The guide also touches on industry pitfalls, such as reliance on automated tools and the challenges of cloud pentesting in a fast-evolving environment. Gang scheduling pods on Amazon EKS using AWS Batch multi-node processing jobs AWS Batch now supports multi-node parallel (MNP) jobs for Amazon EKS, allowing you to gang schedule pods across multiple nodes for tasks that require extensive computation, like machine learning or weather forecasting. Previously, MNP jobs were only available on Amazon ECS. With this update, you can use AWS Batch on EKS to run distributed processing jobs, such as those with Dask, a Python library for parallel computing. The setup involves defining job configurations that include a main node running the scheduler and worker nodes executing the tasks. This approach ensures efficient communication and scaling across nodes, streamlining complex computations in a managed environment. Application Availability Depends on Dependencies Modern applications depend on various services and components, meaning their reliability is tightly linked to the uptime of these dependencies. For example, if an application like Tekata.io needs to maintain 99.9% uptime, but it relies on several services with only 99.9% uptime each, the combined effect could reduce Tekata.io’s overall availability. To hit the desired uptime, dependencies need to have even higher availability. The formula \( A = U^N \) shows that if your application’s target uptime is 99.9% and it has 7 dependencies, each dependency must have an uptime of 99.99% to meet that target. ⚡TechWave: Cloud News & Analysis Kubernetes 1.31: Fine-grained SupplementalGroups control In Kubernetes 1.31, a new feature called `supplementalGroupsPolicy` was introduced to give better control over how supplementary group IDs are handled in Pods. Previously, Kubernetes automatically included group memberships defined in the container’s `/etc/group` file, which could lead to unexpected group IDs being applied and potentially cause security or access issues. With this update, you can now specify a `Strict` policy that only includes the group IDs explicitly set in the Pod's manifest, excluding any additional groups defined in the container image. Announcing Terraform Google Provider 6.0.0 The Terraform Google Provider 6.0.0 introduces several enhancements for better management of Google Cloud resources. Key updates include the option to opt-out of a default label ("goog-terraform-provisioned") that identifies Terraform-managed resources, improved protection against accidental resource deletion with new deletion protection fields, and increased flexibility with longer name prefixes for resources. New capabilities in VMware Private AI Foundation with NVIDIA Key updates in VMware Private AI include a Model Store for secure LLM management, a streamlined deployment process, and new NVIDIA capabilities like NIM Agent Blueprints for custom AI workflows. Future updates will include better GPU management, advanced data indexing and retrieval services, and tools for building AI agents. GitLab Announces the General Availability of GitLab Duo Enterprise GitLab has launched GitLab Duo Enterprise, an AI-powered add-on designed to enhance the software development lifecycle for DevSecOps teams. Priced at $39 per user per month, this tool integrates advanced AI features to improve code generation, security vulnerability detection, and team collaboration. It builds on the capabilities of GitLab Duo Pro by adding enterprise-focused tools like vulnerability resolution, root cause analysis, and AI impact dashboards. Grafana 11.2 release: new updates for data sources, visualizations, transformations, and more Notable additions include support for new data sources like Yugabyte and Amazon Managed Service for Prometheus, updates to visualizations such as standardized tooltips and pagination for state timelines, and improvements in transformations like data transposing and enhanced template variable support. The release also includes better alerting features, integration improvements for OAuth and SAML providers, and a migration assistant for easier transition to Grafana Cloud. 🛠️HackHub: Best Tools for Cloud sorintlab/stolon Stolon is a cloud-native tool designed to manage PostgreSQL databases with high availability, making it suitable for deployment in various environments including Kubernetes and traditional infrastructures. It leverages PostgreSQL's streaming replication and integrates with cluster stores like etcd, Consul, or Kubernetes for leader election and data storage. keel-hq/keel Keel is a lightweight tool for automating updates to Kubernetes deployments without needing complex command-line interfaces or APIs. It integrates directly with Kubernetes and Helm, using labels and annotations to manage updates based on semantic versioning policies. apecloud/kubeblocks KubeBlocks is an open-source tool designed to simplify the management of multiple database types on Kubernetes using a unified set of APIs. Instead of dealing with different operators for each database, KubeBlocks provides a single control plane to manage various databases such as PostgreSQL, Redis, and Kafka. It offers a standardized approach to database lifecycle management, day-2 operations, and observability, with support for backup, recovery, and monitoring. caicloud/cyclone Cyclone is a workflow engine built for Kubernetes that manages end-to-end pipelines without requiring extra dependencies. It operates across various Kubernetes environments, including public, private, and hybrid clouds. Cyclone offers features like DAG graph scheduling, flexible parameterization, and integration with external systems. It supports triggers, multi-cluster execution, multi-tenancy, and automatic resource cleanup. splunk/qbec Qbec is a CLI tool designed for managing Kubernetes objects across multiple clusters or namespaces using jsonnet, a data-templating language. It simplifies Kubernetes configuration management by allowing users to define and deploy objects in various environments efficiently. Qbec is similar to tools like kubecfg and ksonnet. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 0
  • 0
  • 6934
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
Shreyans from Packt
19 May 2025
8 min read
Save for later

Kubernetes v1.33 Fixes a 10-Year-Old Image Pull Loophole

Shreyans from Packt
19 May 2025
8 min read
The Lost Fourth Pillar of ObservabilityCloudPro #92Sponsored: Most GenAI projects die in the proof-of-concept stage. This session by Rubrik shows you how to push past that👇Save Your SpotThis week’s CloudPro issue has got a bunch of things I’ve either run into myself or seen others get tripped up by:📌AWS defaults that quietly expose more than they should📌a Kubernetes bug that’s been around for ten years📌GitHub Actions setups that look fine until someone finds a way inThere’s also a couple posts you'll find helpful, like building a CI/CD pipeline that’s actually fast, or understanding how containers really run under the hood.Hope a few of these come in handy when you need them.Also: we’re planning another special issue for next week. Any ideas on what we should dive into, or an expert you’d love to hear from? Just reply back to this email, I’d really like to hear what you think.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityAmazon GuardDuty Malware Protection for EC2 now available in AWS GovCloud (US) RegionsAmazon has released malware protection for EC2 in AWS GovCloud (US) regions. It scans EBS volumes attached to EC2 instances and container workloads to detect potential malware. The system supports both automatic scans based on suspicious behavior and manual scans using the EC2 instance's ARN. It works without adding any new software and does not impact workload performance.Amazon VPC adds CloudTrail logging for VPC resources created by defaultAmazon VPC now logs creation and deletion of default resources—like Security Groups, Route Tables, and Network ACLs, when a VPC is created or deleted. Previously, CloudTrail only captured explicitly created resources, making audits harder. This update helps teams improve governance and track changes more easily.Guardrails for Your Cloud: A Simple Guide to OPA and TerraformThis post shows how to use OPA to block risky Terraform changes like unencrypted S3 buckets or open security groups. It explains how to write Rego policies, run checks on Terraform plans, and enforce standards like required tags and deployment restrictions. Helpful for adding policy-as-code guardrails to IaC workflows.Shadow Roles: AWS Defaults Can Lead to Service TakeoverThis research shows how default AWS service roles, like those for SageMaker, Glue, and EMR, often come with overly broad S3 permissions, such as AmazonS3FullAccess. Attackers can abuse these defaults to escalate privileges and compromise other services. Real-world scenarios include model-based attacks via Hugging Face and cross-service takeovers through default IAM roles.Hardening GitHub Actions: Lessons from Recent AttacksTwo recent supply chain attacks exploited weak GitHub Actions workflows, compromising popular repos via over-permissive settings and exposed secrets. The report urges tighter defaults: set tokens to read-only, limit third-party Actions, avoid risky triggers like pull_request_target, and never expose secrets to forks. It also warns self-hosted runners can be dangerous if shared or persistent.Build Your Own AI Agents Over The WeekendJoin the live "Building AI Agents Over the Weekend" Workshop starting on June 21st and build your own agent in 2 weekend. In this workshop, the Instructors will guide you through building a fully functional autonomous agent and show you exactly how to deploy it in the real world.BOOK NOW AND SAVE 35%Use Code AGENT35 at checkout⚙️ Infrastructure & DevOpsRedis Is Open Source AgainRedis has shifted back to an open source license (AGPLv3) for Redis 8 after a year under more restrictive licenses meant to block cloud providers from monetizing it freely. The pivot follows the rise of the Valkey fork, backed by AWS and Google, and a recognition that Redis had lost favor with parts of the developer community.37signals Says Goodbye to AWS: Full S3 Migration and $10M in Projected Savings37signals has fully migrated 18 PB of data off AWS S3 to its own Pure Storage-based infrastructure, ending over a decade on the platform. AWS waived the $250K egress fee, aligning with EU Data Act requirements. The company expects to cut infrastructure costs from $3.2M to under $1M annually, saving over $10M in five years.Docker Explained: Finally Understand Containers Without Losing Your Mind (Probably)This post explains how Docker packages your code and dependencies into isolated containers that run the same everywhere. It covers Dockerfiles, images, layers, and containers with clear examples. Useful for devs struggling with environment issues during deployment.How I Tuned My CI/CD Pipeline To Be Done in 60 SecondsA solo developer reduced their GitHub Actions CI/CD pipeline from over 5 minutes to under 60 seconds using parallel jobs, caching, and Makefile tuning. They optimized builds, tests, and linting while managing GitHub's billable minutes. The result: fast, repeatable deploys with zero YAML debugging overhead.Ultimate DevOps Roadmap 2025: Learn Automation, ContainerizationThis guide lays out a step-by-step DevOps learning plan for 2025, covering scripting, cloud, CI/CD, Kubernetes, IaC, and AIOps. It includes timelines, open-source tools, and free resources for each topic. Useful for engineers building a modern, automation-driven skillset from scratch.📦 Kubernetes & Cloud NativeKubernetes v1.33 Fixes a 10-Year-Old Image Pull LoopholeKubernetes v1.33 closes a decade-old loophole that let pods reuse cached private images without valid pull credentials. With a new Kubelet flag, image access is now authorized even if the image already exists on the node. This improves security in multi-tenant clusters using private registries.Announcing etcd v3.6.0The first etcd minor release in four years adds full downgrade support, better memory efficiency, and removes the deprecated v2store. It introduces Kubernetes-style feature gates, livez/readyz probes, and SIG-etcd governance under Kubernetes. A 50% memory drop and ~10% throughput boost make it the most optimized and robust release to date.Kubernetes API Groups Explained Like You’re 5: Why They Matter (With Real Examples)This post simplifies Kubernetes API groups using familiar YAML examples like apps/v1 and rbac(.)authorization(.)k8s(.)io/v1. It breaks down how resources are grouped and versioned to help engineers better navigate manifests. A useful primer for anyone confused by Kubernetes API structure.Kubernetes Production ChecklistThis post offers a detailed checklist of proven Kubernetes production best practices—from health checks and autoscaling to RBAC, secrets, and observability. It covers what really matters for keeping systems secure, resilient, and scalable in real-world environments.Building Kubernetes (a lite version) from scratch in GoThis project walks through building a simplified Kubernetes clone in Go, recreating the control plane, scheduler, and kubelet logic using HTTP APIs and in-memory storage. It’s a hands-on way to demystify how reconciliation loops and pod lifecycles work under the hood.🔍 Observability & SREIntroducing the OTTL Playground for OpenTelemetryElastic has launched OTTL Playground, a browser-based tool for testing OpenTelemetry Transformation Language (OTTL) statements in real time. It lets users run processors like transform and filter, view diffs, logs, and JSON outputs, and safely test transformations without affecting production. It’s built with WebAssembly and offers shareable config links for easier collaboration.Last9 MCP Server: Fix Production Issues in Your Local EnvironmentLast9 has launched MCP Server, a tool that brings real production exceptions (with full context) into your local dev environment. It captures stack traces, request parameters, and environment variables so bugs can be reproduced and fixed precisely where you're coding. It integrates with AI agents in editors like Claude (via Cursor, Windsurf) to auto-suggest fixes, cutting debug time by over 35%.The Lost Fourth Pillar of ObservabilityCloudQuery argues that configuration data, unlike logs, metrics, and traces, offers crucial insights without needing instrumentation. It’s high-cardinality, API-collected, and best stored relationally. Monitoring config data helps track security posture, compliance, cost leaks, and infrastructure drift. Integrating it with traditional observability sharpens root cause analysis and preemptive alerting.A tcpdump Tutorial with ExamplesDaniel Miessler’s tutorial breaks down tcpdump into 50 real-world examples for capturing and analyzing network traffic. From filtering by IP, port, and protocol to saving captures and flag-specific filters, it’s a compact field guide for security engineers and SREs. Great for fast, precise troubleshooting from the command line.How Kubernetes Runs Containers : A Practical Deep DiveThis tutorial breaks down how Kubernetes runs containers by tracing a pod’s lifecycle on a Linux VM using k3s, crictl, and pstree. It shows how pods are just Linux processes isolated by namespaces and cgroups, with container runtimes like containerd managing their lifecycle. This clarity helps engineers debug resource limits, network issues, and process isolation at a low level.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 6552

Shreyans from Packt
26 May 2025
8 min read
Save for later

Bash vs Python in Cloud Infrastructure

Shreyans from Packt
26 May 2025
8 min read
By Donald TevaultCloudPro #93This week’s CloudPro is a guest special from Donald Tevault, the author of The Ultimate Linux Shell Scripting Guide.He’s written today's newsletter on Bash vs Python in Cloud Infrastructure, and why shell scripting still wins in a bunch of real-world cases. He compares real tasks in both languages and shows how often Bash just gets you there faster, with less setup and fewer surprises.If you want to go deeper, CloudPro readers get 40% off the ebook for the next 72 hours. Just use the code CLOUDPRO at checkout.Cheers,Shreyans SinghEditor-in-ChiefGET 40% OFF on eBOOKBash vs Python in Cloud Infrastructure,by Donald TevaultPython is a great programming language, and you can do a lot of awesome stuff with it.However, you might at times find that Python is more than you need, or more than you can easily learn. For such jobs, you might want to consider shell scripting instead.Let’s look at some specific reasons:To begin with, Python might not be installed on every workstation, server, IoT device, or container that you need to administer.On the other hand, every Linux, Unix, or Unix-like system that you’ll encounter has a shell already installed. Apart from bash, you’ll find Bourne Shell on BSD-type systems, and lightweight shells such as ash or dash on Linux for IoT devices.Now, let’s say that you have a Linux-based IoT device and you need to parse through its webserver logs. The tools you’ll need to do this with shell scripting are already there, but Python likely isn’t.Shell Script Portability versus Python PortabilityThe difference between bash and the other shells that I mentioned is that bash has some advanced features that the other shells lack. If you know that you’ll only need to run your scripts in a bash environment, then you can definitely take advantage of the extra bash features.But, by avoiding the bash-specific features, you can create shell scripts that will run on a wide variety of shells, including bash, ash, dash, or Bourne Shell. Fortunately, that’s not as hard as it would seem. For example, you can create variable arrays in bash, but not in the other shells. If you need cross-shell portability but also need the benefits of using an array, you can easily create a construct that simulates an array, and that has the same functionality. It’s easy-peasy once you know how.One portability problem with Python involves Python’s use of programming libraries that might or might not be installed on every device on which the Python script needs to run. In fact, you might have encountered this problem yourself if you’ve ever downloaded a Python script from GitHub. If you’re not a Python expert, it might take you a while to figure out how to install all of the required libraries.With shell scripting, you don’t need to worry about libraries, because shell scripts use the command-line utilities that already come installed on pretty much every Linux, Unix, or Unix-like system. Another problem is that scripts that were created for the old Python 2 aren’t always compatible with the new Python 3.Next, let’s talk about something that’s especially important to me personally.The Shell Scripting Learning Curve versus the Python Learning CurveIf you’re a DevOps person, you’ve likely already mastered Python. But, if you’re more into systems administration, there’s a good chance that you haven’t had much experience with Python. Fear not, because even if you’re lousy with learning programming languages, as I am, you can still learn how to do some awesome things with old-fashioned shell scripting. Even if you do know Python, you might find that certain jobs can be accomplished more quickly and easily with shell scripting than with Python.For example, here’s the script that I use to update and shut down the OpenMandriva workstation that I’m using right now:#!/bin/bashdnf -y distro-sync && shutdown now All this shell script contains is just the commands that I would normally run from the command line. With shell scripting, no coding skill at all is required for this.Working with text files is way easier with shell scripting. Let’s take this text file with a listing of classic automobiles:plymouthplymouthplymouthchevyfordvolvofordchevybmwbmwhondafordtoyotachevyfordjeepedselfordsatellitefurybreezemalibumustangs80thunderbirdmalibu325i325iaccordtaurusrav4impalaexplorerwranglercorsairgalaxie19701970199620001965199820031999198519852001200420021985200320031958196415473116604510215501156030101808525544712860025004300300010000985035003500450100060001700075015509500160075060The fields in this file represent the make, model, year, mileage in thousands of miles, and U.S. dollar value of each car. Now, let’s say we want to sort this file alphabetically and save the output to a new file. Here’s how you could do it with Python:#!/usr/bin/pythondef sort_file_content(in_path, out_path): lines = [] with open(in_path) as in_f: for line in in_f: lines.append(line) lines.sort() with open(out_path, 'w') as out_f: for line in lines: out_f.writelines(line)if __name__ == "__main__": input_file = "autos.txt" output_file = "sorted_autos.txt" sort_file_content(input_file, output_file)Here’s how you’d do it with a shell script:#!/bin/bashsort autos.txt > sorted_autos.txtEither way, we get the same results, which look like this:bmwbmwchevychevychevyedselfordfordfordfordfordhondajeepplymouthplymouthplymouthtoyotavolvo325i325iimpalamalibumalibucorsairexplorergalaxiemustangtaurusthunderbirdaccordwranglerbreezefurysatelliterav4s8019851985198519992000195820031964196520042003200120031996197019702002199811560855060472512845101530541167315418010245010001550350030007509500601000017000350060001600430025006007509850I think you see that doing this with shell scripting is way faster and easier. Finally, let’s see how shell scripting can help us with cloud operations.Shell Scripting for Cloud Operations Let’s say that you have a web server that’s running on either a VPS or a remote IoT device, and you want a list of IP addresses of clients that have accessed it, along with status codes and number of bytes transferred. Here’s a Python script that you might use for that:#!/usr/bin/pythonimport sysfrom dataclasses import dataclass@dataclass(frozen = True)class LogEntry: ip_address : str n_bytes : int status_code : intdef main(args): file_path = args[0] entries = parse_log_file(file_path) for e in entries: print(e)def parse_log_file(file_path): try: with open(file_path) as log_file: return [parse_log_line(line) for line in log_file] except OSError: abort(f'File not found: {file_path}')def parse_log_line(line): try: xs = line.split() return LogEntry(xs[0], int(xs[9]), int(xs[8])) except IndexError: abort(f'Invalid log file format: {file_path}')def abort(msg): print(msg, file = sys.stderr) exit(1)if __name__ == '__main__': main(sys.argv[1:])Here’s a bash script that does the same thing:#!/bin/bashecho "ip address, status code, number of bytes"cut -d" " -f 1,10,9 /var/log/httpd/access_logThat’s right. A simple, two-line shell script can take the place of that entire Python script. At any rate, the output of the shell script will look something like this:ip address, status code, number of bytes192.168.0.20 403 5760192.168.0.20 403 199192.168.0.20 200 4194192.168.0.20 200 5714192.168.0.20 404 196192.168.0.20 403 5760192.168.0.18 403 5760192.168.0.18 403 199192.168.0.18 200 4194192.168.0.18 200 5714192.168.0.18 404 196You can also create shell scripts to a`utomate management of your cloud services. For example, here’s a script that can start or stop an EC2 instance on Amazon Web Services:#!/bin/bashread -p "Enter the EC2 instance ID: " INSTANCE_IDread -p "Do you want to start or stop the instance? (start/stop): " ACTIONif [[ "$ACTION" == "start" ]]; then echo "Starting instance $INSTANCE_ID..." aws ec2 start-instances --instance-ids $INSTANCE_IDelif [[ "$ACTION" == "stop" ]]; then echo "Stopping instance $INSTANCE_ID..." aws ec2 stop-instances --instance-ids $INSTANCE_IDelse echo "Oops! Please type 'start' or 'stop'."fiWhen you run the script, just type in the instance ID at the first prompt, and then type either start or stop at the second prompt. This is a lot easier than typing the entire aws command every time you need to start or stop an instance. You can automate almost any other aws task in the same manner.ConclusionTo be sure, shell scripting has its limitations. For large, complex programs that require high performance, Python, or perhaps even a compiled language such as C, would be much better. But as I’ve just demonstrated, there are many times when bash scripting is definitely a much better choice.BASHPYTHONMore portableNo library installation requiredAlways availableBest for quick jobsNot the best for complex problemsGood performance for small jobs, but python is better for large jobsBetter for complex programmingVery flexibleGood performanceSteeper learning curvePortability problems between Python 2 and Python 3Dealing with libraries can be problematicNot always availableTo learn more about shell scripting, check out The Ultimate Linux Shell Scripting Guide by Donald:GET 40% OFF on eBOOKHi again, Shreyans here.Big thanks to Donald for putting this together. If you liked the piece, you’ll love his YouTube channel where he walks through practical Linux topics.He’s also written two other books worth your time:Linux Service Management Made Easy with systemdMastering Linux Security and HardeningThat’s all for now. Hope you found something useful in this issue!Cheers,ShreyansWhat did you think of this special issue📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 3841

Shreyans from Packt
07 Jul 2025
7 min read
Save for later

Kubernetes Faces Gaps in Handling Device Failures for AI/ML Pods

Shreyans from Packt
07 Jul 2025
7 min read
Uber Cuts CI Costs by 53% Using Smarter Build PrioritizationCloudPro #98One of the few GenAI tools that actually feels built for engineersMost GenAI tools just dress up autocomplete. Shield’s AmplifAI is different. It uses agentic AI, systems that reason and act across steps, to take real work off your plate.Think: auto-surfacing hidden compliance risks, navigating tangled comms threads, explaining every decision clearly. No magic, just well-architected automation with human-in-the-loop guardrails.If you're curious what useful AI looks like in practice, start here.Learn More> Attack graphs are redefining IAM risk modeling from the ground up> Airbnb’s load testing framework bakes chaos into CI/CD> Kubernetes is still awkward with GPU failures, and no one’s fixed it yetPlus: SRE agents with $21M backing, mirrord’s new team debugging trick, and visual Kubernetes troubleshooting that finally makes sense.Cheers,Shreyans SinghEditor-in-ChiefNetwork security that just works: no apps, no frictionSecurity shouldn’t depend on whether your users remember to install something. That’s why I found Whalebone so interesting: it protects millions of devices from phishing, malware, and scams at the DNS level, no downloads required.It’s cleanly integrated, telco-ready, and surprisingly quick to deploy (2 months). Telcos like O2 and A1 are already using it to boost ARPU while quietly shielding users in the background.For teams building secure, seamless infra:Learn More🔐 Cloud SecurityWhy Default Pod Communication in Kubernetes is a Security RiskBy default, all pods in a Kubernetes cluster can talk to each other, which simplifies app deployment but opens up security risks. Network policies are the main way to restrict this traffic, using labels and namespaces to control ingress and egress. Support for policies depends on your CNI plugin: tools like Calico enable advanced rules, while others like flannel do not.Why IAM demands an Attack Graph first approachMost IAM programs start with static access lists, but attackers exploit paths, not lists. An Attack Graph shows how identities and permissions can be chained for lateral movement and takeover. By modeling these paths first, security teams can prioritize real, exploitable risks and fix what matters. This shift helps align identity security with how attacks actually happen, not just how access is managed.12-Month Cloud Security Challenge Just Dropped – Practice, Compete, and Get CertifiedWiz has launched Cloud Champions, a monthly CTF challenge series focused on real-world cloud security scenarios. Each challenge is crafted by Wiz researchers and designed to help practitioners sharpen their skills through hands-on problem-solving. The first challenge, “Perimeter Leak,” went live in June, with more slated through May 2026. A leaderboard tracks participant progress and highlights top performers.Building AI agents that hunt like cloud adversariesSecurity researchers are building AI agents that think and act like advanced cloud attackers: chaining permissions, pivoting across services, and executing real-world privilege escalation paths in AWS. These agents outperform traditional tools by reasoning contextually and automating multi-step attack logic.Simplify Kubernetes Security With Kyverno and OPA GatekeeperKyverno and OPA Gatekeeper help secure Kubernetes by blocking risky configurations before they’re deployed. Kyverno is easier to use, with YAML policies and native Kubernetes integration, while OPA Gatekeeper offers deeper flexibility using Rego for complex rules. Both tools can enforce critical security practices, like banning :latest image tags, to improve cluster safety and compliance.⚙️ Infrastructure & DevOpsUber Cuts CI Costs by 53% Using Smarter Build PrioritizationUber enhanced its SubmitQueue CI system to reduce CPU usage by 53% and cut wait times by 37% across its massive monorepos. The update uses a new probabilistic model to prioritize builds that are more likely to succeed or unblock smaller changes. This lets faster commits bypass larger ones.Figma spends $300,000 on AWS dailyFigma disclosed in its IPO filing that it now spends nearly $300,000 daily on AWS, committing to $545 million over five years. The design platform is fully dependent on AWS infrastructure and policies, highlighting vendor lock-in risks.TOP 10 DevOps Tools in 2025: Based on 300 LinkedIn job postsGitHub Actions, Terraform, Kubernetes, and ArgoCD top the list, praised for integration and power, but not without their quirks. The takeaway: there's no perfect stack, just the right mix for your team’s context and scale.mirrord Adds Queue Splitting to Enable Shared Debugging in the Cloudmirrord for Teams now supports queue splitting, letting developers work on the same service in a shared cloud environment without stepping on each other’s toes. With support for AWS SQS (Kafka and RabbitMQ coming soon), devs can apply filters so only their local app receives relevant messages. This enables real-time debugging with zero disruption to live services or teammates.📦 Kubernetes & Cloud NativeKubernetes Faces Gaps in Handling Device Failures for AI/ML PodsAs AI/ML workloads relying on GPUs become more common, Kubernetes struggles with device failure modes like partial GPU outages, degraded performance, and scheduling fragility. DIY fixes exist, but lack standardization, and core systems don’t correlate device health with pod behavior.Simplifying platform engineering at John Lewis - part one | Google Cloud BlogJohn Lewis replaced its monolithic commerce system with a multi-tenant, microservice-based architecture on Google Kubernetes Engine. A central “paved road” platform now automates provisioning, observability, and security, letting product teams deploy independently while maintaining guardrails. This approach boosts developer velocity, minimizes cognitive load, and balances consistency with flexibility as new services emerge.A visual guide on troubleshooting Kubernetes deploymentsAzure Boosts PostgreSQL Performance on AKS With Local NVMe & CloudNativePGMicrosoft now supports high-performance PostgreSQL on Azure Kubernetes Service using local NVMe via Azure Container Storage and the CloudNativePG operator. Benchmarks show up to 26,000 TPS with sub-5ms latency. For price-sensitive workloads, Premium SSD v2 offers flexible scaling and solid performance.🔍 Observability & SREAirbnb Scales Load Testing with Impulse FrameworkAirbnb developed Impulse, a decentralized load-testing framework integrated with CI/CD, to help teams test service reliability at scale. It includes a context-aware load generator, dependency mocker, traffic replay collector, and synthetic API generator for async flows.How we're building an agentic system to drive Grafana | Grafana LabsGrafana is moving beyond simple AI chat responses by building agentic systems that can reason and take action, like creating dashboards or debugging metrics, based on real-time context. Powered by the open source MCP Server, these agents interact with Grafana APIs to perform complex, multi-step workflows.Ciroos Launches AI SRE Teammate with $21M in FundingCiroos has raised $21 million to launch its AI-powered “SRE Teammate,” a multi-agent system that autonomously detects, diagnoses, and resolves incidents across cloud, Kubernetes, and networking environments. Unlike traditional observability tools, it acts like an expert partner, correlating signals and automating root-cause analysis without runbooks.Benchmarking OpenTelemetry Overhead in Go ApplicationsA recent benchmark measured the performance impact of enabling OpenTelemetry tracing in a Go app under 10,000 req/s. CPU usage rose ~35% and memory jumped from 10MB to 15–18MB, mostly due to span processing. p99 latency increased by ~5ms, and outbound telemetry added 4MB/s of network traffic.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
16 Jun 2025
9 min read
Save for later

How to Make Sure Your Kubernetes Sidecar Starts Before the Main App

Shreyans from Packt
16 Jun 2025
9 min read
Why Automatic Rollbacks Are Risky and Outdated in Modern DevOpsCloudPro #96Platform Weekly - the world’s largest platform engineering newsletterWith over 100,000 weekly readers Platform Weekly dives into platform engineering best practices, platform engineering news, and highlights, lessons and initiatives from the platform engineering community.Subscribe Now📌 A hidden prompt injection flaw in GitLab Duo that quietly leaked source code📌 Just-in-time AWS access using Entra PIM (yes, that’s possible now)📌 Cloud SQL charging 2TB storage for 6GB of data, because of WAL logs📌 Why automatic rollbacks in DevOps might be doing more harm than goodYou’ll also find sharp reads on scaling Terraform teams, new volume tools for AI/ML in GKE, and a brutally honest take on Kubernetes complexity. On the observability side, AWS added visual dashboards to Network Firewall, and OpenTelemetry clarified how to treat logs vs. events.Hope you find something that helps you ship safer, smarter, or faster.Cheers,Shreyans SinghEditor-in-ChiefPS: If you’re not already reading Platform Weekly, I’d recommend it.It’s one of the few newsletters I make time for every week: focused on platform engineering, cloud native, and the kind of problems teams actually face. 100,000+ people read it, but it still feels like it’s written by someone who gets it.Here’s the link if you want to check it outSubscribe Now🔐 Cloud SecurityJust-in-time AWS Access to AWS with Entra PIMJust‑in‑time privileged access can be implemented by integrating Microsoft Entra PIM with AWS IAM Identity Center using SCIM/SAML, enabling temporary group-based access tied to approval workflows and time limits. By mapping Entra security groups to AWS permission sets (e.g. EC2AdminAccess) and enabling eligibility/activation in PIM, users gain access only when approved, and only for a set duration.On‑Demand Rotation Now Available for KMS Imported KeysAWS KMS now lets you rotate imported symmetric key material on‑demand without needing to create a new key or change its ARN, simplifying compliance and security by avoiding workload disruptions. New API operations, including RotateKeyOnDemand and KeyMaterialId tracking, let you import, rotate, audit, expire, or delete individual key versions while retaining decryption access to older ciphertext.CloudRec: multi-cloud security posture management (CSPM) platformCloudRec is an open‑source, scalable CSPM platform that continuously discovers 30+ cloud services across AWS, GCP, Alibaba, and more, offering real‑time risk detection and remediation.It uses OPA‑based declarative policy management, enabling dynamic, flexible rule definitions without code changes or redeployment.How to use the new AWS Secrets Manager Cost Allocation Tags featureAWS Secrets Manager now supports cost allocation tags, letting you tag each secret (e.g., with CostCenter) and track its costs in Cost Explorer or cost-and-usage reports.Enable tags in Billing → Cost Allocation Tags, then filter or group secrets costs by tag to see spend per department or project.GitLab Duo Prompt Injection Leads to Code and Data ExposureA hidden prompt injection flaw in GitLab Duo allowed attackers to embed secret instructions, camouflaged in comments, code, or MR descriptions, triggering the AI assistant to reveal private source code. The attacker leveraged streaming markdown rendering and HTML injection (like <img> tags) to exfiltrate stolen code via base64-encoded payloads. GitLab patched the vulnerability in February 2025, blocking unsafe HTML elements and tightening input handling.⚙️ Infrastructure & DevOpsAmazon API Gateway introduces routing rules for REST APIsAmazon API Gateway now supports routing rules for REST APIs on custom domains, allowing dynamic routing based on HTTP headers, URL paths, or both. This enables direct A/B testing, API versioning, and backend selection, removing the need for proxies or complex URL structures.Amazon EC2 now enables you to delete underlying EBS snapshots when deregistering AMIsEarlier, snapshots had to be removed separately, often leading to orphaned volumes and wasted spend. Now. AWS EC2 will let users automatically delete EBS snapshots when deregistering AMIs, cutting down on manual cleanup and storage costs. This update streamlines resource management with no extra cost and is available across all AWS regions.Why is your Google Cloud SQL bill so high?A developer discovered that their Cloud SQL instance showed 2 TB of usage for only 6 GB of actual data, due to retained Write-Ahead Logs (WAL) from Point-in-Time Recovery. These logs can silently bloat storage costs when frequent transactions occur. To control costs, users should reduce WAL retention or re-provision instances with right-sized storage.Why Automatic Rollbacks Are Risky and Outdated in Modern DevOpsAutomatic rollbacks seem helpful but often fail due to the same issues that break deployments, like expired credentials or partial database changes. Modern practices like Continuous Delivery and progressive deployment (canary, blue/green, feature flags) offer safer, faster recovery paths. Human oversight adds resilience and learning, making manual intervention more effective than rollback automation.How to structure Terraform deployments at scaleAt scale, Terraform deployments require a clear structure that balances control and team autonomy. Scalr’s two-level hierarchy: Account and Environment scopes, lets central DevOps manage policies and modules, while engineers deploy independently within isolated workspaces. This setup encourages reusable code and standardization through a shared module registry.📦 Kubernetes & Cloud NativeMaking Kubernetes Event Management Easier with Custom AggregationAs Kubernetes clusters grow, managing events becomes harder due to high volume, short retention, and poor correlation. This article shows how to build a custom event system that groups related events, stores them longer, and spots patterns: helping teams debug issues faster. It uses Go to watch, process, and store events, and includes options for alerts and pattern detection.GKE Volume Populator Simplifies AI/ML Data Transfers in KubernetesGoogle Cloud’s new GKE Volume Populator helps AI/ML teams automatically move data from Cloud Storage to fast local storage like Hyperdisk ML, no custom workflows needed. It uses Kubernetes-native PVCs and CSI drivers to manage transfers, delays pod scheduling until data is ready, and supports fine-grained access control.How to Make Sure Your Kubernetes Sidecar Starts Before the Main AppIf your app depends on a sidecar, Kubernetes doesn’t guarantee the sidecar is fully ready before the main container starts, even with the new native support. This article shows how to delay the app start using startupProbe or postStart hooks in the sidecar. These methods let the app wait until the sidecar is actually ready, avoiding startup errors without needing code changes.Not every problem needs KubernetesKubernetes promises scalability and flexibility, but for most teams, it adds unnecessary complexity. Many workloads can be handled more easily with VMs, managed cloud services, or simpler container platforms like AWS Fargate or Google Cloud Run. Unless you truly need hybrid cloud, global scale, or run hundreds of services, Kubernetes may just slow you down and drain resources.What You Actually Need for Kubernetes in ProductionProduction Kubernetes setups need more than just working clusters. Use readiness, liveness, and startup probes correctly to avoid early traffic issues or restarts. Always define CPU and memory limits, isolate secrets using volumes, and enforce RBAC with least privilege. Use HPA for scaling, avoid local storage, and apply network policies to control traffic. Tools like kube-bench, Trivy, and FluentBit help monitor security, cost, and logs effectively.Book Now🔍 Observability & SREAWS Network Firewall launches new monitoring dashboardAWS Network Firewall now includes a monitoring dashboard that shows key traffic patterns like top flows, TLS SNI, HTTP host headers, long-lived TCP flows, and failed handshakes. This helps teams troubleshoot issues and spot security concerns faster. It’s available in all supported regions at no extra firewall cost, but requires Flow and Alert logs to be configured.Official RCA for SentinelOne Global Service InterruptionSentinelOne’s May 29 global service outage was caused by a software flaw in a deprecated infrastructure control system, which accidentally deleted critical network routes. This broke internal connectivity, taking down management consoles and related services. While customer endpoints stayed protected, teams lost visibility and control during the incident.There's a Lot of Bad Telemetry Out ThereMuch of today’s telemetry is noisy, irrelevant, or misleading: causing higher costs, slow troubleshooting, and poor decisions. Common problems include incomplete traces, outdated metrics, irrelevant logs, and data overload. Engineers often lack clear standards or guidance on good telemetry, especially for newer systems like LLMs. To fix this, teams should define what's useful, apply consistent conventions (e.g. OpenTelemetry), and work closely with devs to improve instrumentation at the source.OpenTelemetry Clarifies Its Approach to Logs and EventsOpenTelemetry treats logs as structured records sent through its Logs API, with a special focus on events: logs with a defined schema and guaranteed structure. Events are preferred for new instrumentation, as they integrate with context and can correlate with traces and metrics. Unlike spans, events have no duration or hierarchy. OpenTelemetry recommends using logs mainly for bridging existing systems, while semantic instrumentation should rely on events for consistency and context sharing.Storing all of your observability signals in one place matters!Treating traces, logs, and metrics as separate “pillars” creates silos and hinders correlation. Many teams still split signals across tools or vendors, leading to fragmented insights and painful debugging. A centralized “single pane of glass” setup helps correlate signals in one place, making it easier to understand system behavior.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Shreyans from Packt
30 Jun 2025
8 min read
Save for later

Migrating Uber’s Compute Platform to Kubernetes

Shreyans from Packt
30 Jun 2025
8 min read
How to Break Up a Terraform Terralith Without Breaking EverythingCloudPro #97All Books $9.99 | 8 Hours RemainingSHOP NOW1. AWS’s own security tool introduced a privilege escalation risk2. Terraliths slowing you down? Here's how to break them up safely3. Uber’s 3M-core migration to Kubernetes: what it really tookPlus: BitM attacks that bypass MFA, schema migration via CI/CD, and a no-fluff guide to how Kubernetes CRDs actually work.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityAWS Launches Threat Technique Catalog to Share Real-World Attack DataAWS has released the Threat Technique Catalog, a resource mapping real-world attack techniques seen in customer incidents to the MITRE ATT&CK framework. Built from AWS CIRT investigations, it includes detection and mitigation advice for tactics like token abuse and misconfigured encryption. This gives cloud defenders a practical way to strengthen their AWS environments using adversary-informed data.AWS Launches Preview of Upgraded Security HubAWS has released a preview of its revamped Security Hub, now offering integrated dashboards, exposure mapping, and attack path visualizations to better prioritize and respond to security threats. It correlates findings across GuardDuty, Inspector, Macie, and CSPM to highlight critical gaps and risks.AWS Built a Security Tool. It Introduced a Security Risk.AWS’s “Account Assessment for AWS Organizations” tool unintentionally introduced a cross-account privilege escalation risk due to insecure deployment instructions. It advised users to avoid the management account without clarifying that deploying the hub role in a less secure account could expose high-sensitivity environments. AWS has since updated its documentation to recommend using a secure account.Forgotten DNS Records Enable CybercrimeA threat actor dubbed Hazy Hawk is hijacking abandoned cloud resources, like AWS S3 buckets and Azure endpoints, through dangling DNS records. By taking over subdomains of major organizations, including CDC, Deloitte, and universities, they reroute users to scams and malware via complex traffic distribution systems. The attacks exploit subtle DNS misconfigurations and show how unmanaged cloud resources can silently expose enterprise users to persistent threats.Browser-in-the-Middle Attacks Bypass MFA to Steal Sessions in Real TimeMandiant warns of a growing threat called Browser-in-the-Middle (BitM), where attackers proxy real login pages through their own browsers to steal fully authenticated sessions, even after MFA. BitM tools like Mandiant's internal “Delusion” make this scalable and fast, bypassing traditional phishing protections. Only hardware-backed MFA like FIDO2 or client certificates can reliably block these attacks.Workshop: Unpack OWASP Top 10 LLMs with SnykJoin Snyk and OWASP Leader Vandana Verma Sehgal on Tuesday, July 15 at 11:00AM ET for a live session covering:-The top LLM vulnerabilities-Proven best practices for securing AI-generated code-Snyk’s AI-powered tools automate and scale secure dev.See live demos plus earn 1 CPE credit!Register today⚙️ Infrastructure & DevOpsAWS CloudTrail Adds Detailed Logging for S3 Bulk DeletesAWS CloudTrail now logs individual object deletions made via the S3 DeleteObjects API, not just the bulk operation. This gives teams clearer visibility into which files were removed, improving audit trails and helping meet compliance and security needs. Granular logs also allow finer control via event selectors.AWS Backup adds new Multi-party approval for logically air-gapped vaultsAWS Backup now supports multi-party approval for logically air-gapped vaults, allowing secure recovery even if your AWS account is compromised. Admins can assign trusted approval teams to authorize vault access from outside accounts. This provides an independent, auditable recovery path, strengthening ransomware resilience and governance for critical backups.Inside AWS’s Strategy for Building Bug-Free, High-Performance SystemsAWS shared how it integrates formal and semi-formal methods, like TLA+, model checking, fuzzing, and deterministic simulation, into everyday development to eliminate bugs, boost developer speed, and enable aggressive optimizations. Tools like the P language and PObserve are used across S3, DynamoDB, EC2, and Aurora to model distributed systems, validate runtime behavior, and prove correctness of critical code paths.How to Break Up a Terraform Terralith Without Breaking EverythingLarge monolithic Terraform setups (“Terraliths”) can slow down deploys and increase risk. This guide lays out a clean migration path, starting with dependency mapping and backups, then moving to new root modules using import and removed blocks (in TF 1.7+), or scripted state mv operations. It also covers real-world lessons on inter-module communication, safe rollouts, automation, and state isolation, helping teams modernize IaC safely and modularly.Why It’s Time to Automate Your Database Schema MigrationsMany teams automate their app deployments but still manage database changes manually, leaving room for human error, schema drift, and security risks. This guide explains how tools like Atlas bring schema migrations into your CI/CD pipelines using declarative definitions, automatic diffs, and linting. The result: safer deployments, fewer production credentials, and consistent environments.📦 Kubernetes & Cloud NativeAmazon EKS Pod Identity adds cross-account access supportAmazon EKS Pod Identity now supports cross-account resource access without code changes. You can assign a second IAM role from another AWS account when creating a pod identity, enabling secure access to resources like S3 or DynamoDB via IAM role chaining. This simplifies multi-account architectures in EKS and reduces the complexity of credential management.Amazon GuardDuty expands Extended Threat Detection coverage to Amazon EKS clustersAmazon GuardDuty now detects advanced attack sequences in EKS clusters by correlating signals across audit logs, runtime activity, and API usage. This helps uncover threats like privilege escalation and secret exfiltration that might be missed by isolated alerts. It gives security teams a complete view of Kubernetes compromises and reduces time to investigate and respond.How CRDs Extend and Hook into the Kubernetes APIThis deep dive explains how Kubernetes Custom Resource Definitions (CRDs) work behind the scenes. It walks through how CRDs register with the Kubernetes API, how schemas validate custom objects, and how controllers fetch and handle them via client-go. You’ll learn how CRDs are serialized, discovered, and routed through the aggregation layer, giving you a detailed mental model for building robust Kubernetes extensions.Migrating Uber’s Compute Platform to KubernetesUber migrated all stateless services, powering 3M+ cores and 100K daily deployments, from Mesos to Kubernetes to standardize infrastructure and tap into the cloud-native ecosystem. They tackled extreme scale (7,500-node clusters), rebuilt integrations, and automated the shift using their internal “Up” platform. Custom solutions like artifact preservation, gradual scaling, and rollout heuristics ensured reliability, while Kubernetes UI and scheduler tweaks enabled smooth operations.Stop Building Platforms Nobody Uses: Pick the Right Kubernetes Abstraction with GitOpsThis post calls out a common pitfall: over-engineering internal platforms that developers don’t adopt. It argues that real developer pain: context switching, CI/CD complexity, insecure YAML sprawl, must shape the abstraction layer. Tools like Kro and Score can simplify Kubernetes via GitOps, but only when they reduce complexity without hiding critical decisions. The message: build abstractions that solve real problems, not just tick architectural boxes.🔍 Observability & SREAmazon VPC Route Server announces logging enhancementsAWS has added new monitoring features to VPC Route Server, including real-time logs for BGP and BFD sessions, historical data tracking, and flexible delivery via CloudWatch, S3, and Firehose. This helps engineers troubleshoot connectivity issues faster without needing AWS Support.Amazon Athena adds managed query results with built-in storage and cleanupAmazon Athena now supports managed query results, eliminating the need to preconfigure S3 buckets or manually clean up old results. This simplifies analysis workflows, especially for teams using automated workgroup creation.Grepr - Dynamic ObservabilityGrepr launched an ML-powered observability pipeline that filters, aggregates, and routes telemetry data before it hits your tools, reducing log volumes and storage costs significantly. It can scale automatically, backfill data during incidents, and runs alongside existing setups with minimal config. Ideal for teams seeking cost control without losing visibility.Chip auto-detects root causes without manual alerting or dashboardsChip is a zero-config monitoring agent that auto-instruments apps and alerts only on real customer-impacting issues. It tracks everything from code commits to Kubernetes events to find root causes fast, using real-time outlier and cohort detection. Built for fast-moving teams who want signal without the noise.Parseable offers fast, open-source observability on S3 with low resource useParseable is a lightweight, S3-first observability platform designed for speed and cost-efficiency. It delivers 90% faster queries than Elastic, uses up to 70% less CPU/memory, and integrates easily with AI and observability tools. Fully open source with no vendor lock-in.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!Disclaimer: Some eBooks and videos are excluded from the $9.99 offer. For selected countries, tiered discount pricing may vary.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
02 Jun 2025
8 min read
Save for later

How AWS Lambda Handles Billions of Async Requests Without Breaking a Sweat

Shreyans from Packt
02 Jun 2025
8 min read
How Netflix stores 140 million hours of viewing data per dayCloudPro #94[Sponsored] Learn how your app could evolve automatically, leaving reverse engineers behind with every release.Register NowThis week’s CloudPro has a bunch of things that made me pause and go, “Wait, that’s possible?”📌A GitHub token leak that kicked off a supply chain attack targeting 100K+ repos📌Git tools quietly leaking your credentials with just a newline📌Kubernetes Ingress-NGINX bugs that might be hiding in your setup without you knowingThere’s also some great deep dives, like how Netflix handles 140 million hours of data every day, a homegrown Python bot that auto-heals K8s IP issues, and a hands-on post about cutting a $10K Glue bill down to $400 using Airflow.Hope a few of these help you solve something annoying or spark a weekend project.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityMultiple Vulnerabilities Found in Kubernetes Ingress-NGINXSeveral security flaws (CVEs) were found in the Kubernetes ingress-nginx controller. These issues do not affect Amazon EKS directly because EKS doesn’t include this controller by default. However, if customers manually installed it, they should update to the latest version. AWS has already alerted affected users.How a Leaked GitHub Token Sparked a Widespread Supply Chain Attack Targeting Coinbase and 100,000+ ReposAttackers pulled off a stealthy supply chain attack by leaking a GitHub token from a SpotBugs project, then using it to compromise other GitHub actions like reviewdog and tj-actions. They injected malicious code that silently spread through CI/CD workflows, eventually targeting Coinbase’s open-source project.GitHub Finds Critical ruby-saml Flaws Letting Attackers Bypass SSO and Hijack AccountsGitHub found two serious bugs in the ruby-saml library that let attackers bypass SAML authentication and potentially log in as any user. The problem came from how different XML parsers (REXML and Nokogiri) interpret the same data differently, letting attackers sneak in fake but valid-looking login info.Git Tools Exposed: Bugs in GitHub Desktop, LFS, and CLI Let Attackers Steal User CredentialsA security researcher found that several Git-related tools, including GitHub Desktop, Git Credential Manager, Git LFS, and GitHub CLI, had flaws that let attackers trick them into leaking stored credentials (like tokens or passwords) to malicious servers. Most issues stemmed from how these tools handled special characters like carriage returns or newlines in URLs, causing credentials meant for GitHub to be sent elsewhere.Microsoft Expands Security Copilot with AI Agents to Tackle Phishing, Insider Risks, and Shadow AI ThreatsMicrosoft has upgraded Security Copilot with AI agents that can now handle tasks like phishing detection, insider risk alerts, and vulnerability patching: automatically. These agents help security teams work faster and smarter, especially as cyberattacks become too complex and frequent for humans alone.Web Devs: Turn Your Knowledge Into IncomeBuild the knowledge base that will enable you to collaborate AI for years to come💰 Competitive Pay Structure⏰ Ultimate Flexibility🚀 Technical Requirements (No AI Experience Needed)Weekly payouts + remote work: The developer opportunity you've been waiting for!The flexible tech side hustle paying up to $50/hourApply Now⚙️ Infrastructure & DevOpsAWS Launches Amazon Q Scenarios in QuickSight to Bring Forecasting and What-If Analysis to EveryoneAWS has launched the new "scenarios" feature in Amazon Q for QuickSight, letting users analyze data trends, forecast outcomes, and run what-if simulations, all through simple natural language. You don’t need to be a data expert or use spreadsheets anymore. This tool helps teams make smarter decisions faster.How AWS Lambda Handles Billions of Async Requests Without Breaking a SweatWhen functions are called asynchronously, Lambda queues them, processes them later, and manages retries. For small apps, a single queue may be enough, but for massive scale, AWS uses smart techniques like consistent hashing and shuffle-sharding to separate workloads and reduce the risk of “noisy neighbors” affecting others.AWS CodeBuild Adds Parallel Test Execution to Drastically Speed Up CI PipelinesAWS just made it possible to run tests in parallel using CodeBuild, which means instead of testing code one piece at a time, you can test many pieces at once. This massively cuts down the time it takes for developers to know if their code works, making software updates much faster and less frustrating.How I reduced $10000 monthly AWS Glue bill to $400 using AirflowAkash and his team were spending $10,000/month running data pipelines on AWS Glue, but much of that cost came from paying for idle time. To fix it, they moved all those jobs to Apache Airflow running on EC2 and ECS, using Terraform to manage everything. It was tough—especially setting up workers, Redis, and autoscaling—but they pulled it off and slashed their bill to just $400/month.How to run Firecracker without KVM on cloud VMsNormally, to run lightweight virtual machines (like Firecracker microVMs), you need special hardware features (KVM) or expensive bare-metal cloud servers. But a new method called PVM (Pagetable Virtual Machine)—developed by Ant Group and Alibaba—lets you run Firecracker without KVM, even on cheaper cloud VMs that don’t support nested virtualization.📦 Kubernetes & Cloud NativeKubernetes launches kube-scheduler-simulatorWhen Kubernetes decides where to run an app (called a Pod), it uses a complex component called the scheduler. But understanding why the scheduler makes certain decisions has always been hard. It’s like a black box. This new tool, kube-scheduler-simulator, opens up that black box. It lets you simulate a real cluster and see exactly how the scheduler makes its choices.Kubernetes Launches JobSet to Simplify Large-Scale AI and HPC WorkloadsAs AI models get bigger, training them requires splitting the work across thousands of GPUs or TPUs spread over many servers. Kubernetes can help manage this, but its current tools aren't built to easily handle these complex, multi-part jobs. So, the Kubernetes team introduced JobSet, a new tool that makes it easier to run these distributed training jobs.Kubernetes 1.32 Unlocks Smarter, Safer Linux Swap SupportEarlier, Kubernetes completely disabled swap because it couldn't track memory usage well when swap was involved. But now, after years of progress, Kubernetes 1.32 is finally adding proper support for Linux swap memory, which lets systems use disk space as extra RAM to avoid crashes during memory spikes.How One Home Kubernetes User Beat ISP IP Changes with an Auto-Healing Python BotThe author runs a home Kubernetes setup and relies on a dynamic IP address from their internet provider, which can unexpectedly change. Since IP changes can break things like firewall rules or service configurations, they built a Python program that constantly monitors their IPs. If the IP changes, it automatically updates firewall settings and Kubernetes resources to keep everything running smoothly.Devtron + Argo CD: Enhancing GitOps without disruptionTeams are shipping code faster thanks to AI tools like GitHub Copilot, but their deployment systems, especially Argo CD, can’t keep up. Instead of replacing Argo CD, Devtron now integrates directly with it. This gives users more powerful deployment features like multi-cluster control, better security, and advanced rollout strategies, without breaking or migrating their existing setup.🔍 Observability & SREBuilding a Searchable, Structured Logging System for Real-World DebuggingThe author built a better logging system to help debug issues in a complex app. Instead of messy, inconsistent logs, they used structured logs that are easy to search, and even “canonical” logs that summarize everything about a request in one line. They sent these logs to tools like Loki and Clickhouse, so they could ask smart questions and actually learn from the data.How Netflix stores 140 million hours of viewing data per dayNetflix collects an enormous amount of viewing data every day: from what you watch to when you pause. As this data exploded, their original system started to slow down. So they redesigned it: recent data is stored fast and uncompressed, older data is compressed and moved to long-term storage, and less important data (like short previews) is filtered out.How to build the ultimate March Madness dashboard in GrafanaA techie March Madness fan built a real-time basketball tracking dashboard in Grafana that pulls live NCAA data, like scores and player stats, directly from public APIs. Using Grafana’s Infinity and Canvas plugins, they turned raw JSON into a jumbotron-style scoreboard that updates without refreshes.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;
Read more
  • 1
  • 0

Shreyans from Packt
14 Jul 2025
7 min read
Save for later

Microsoft engineers contributed a new authentication method to Grafana

Shreyans from Packt
14 Jul 2025
7 min read
What Would a Kubernetes 2.0 Look LikeCloudPro #99Daily Cloud Insights. Follow Packt SysOps.Follow Packt SysOps on LinkedIn> Grafana now supports Azure managed identities, so you can skip the usual credential headaches. Really useful if you’re juggling OAuth providers.> Google is catching leaked credentials in public repos within minutes, which honestly should’ve been standard by now.> Kubernetes is adding smarter routing for LLM workloads, reducing GPU bottlenecks. Could be worth a look if you’re running GenAI models.> And there’s finally a practical guide for securing OpenTelemetry collectors with proper mTLS in Kubernetes: cleaner architecture for multi-cluster setups.We also have some good reads on safer curl installs, scaling Argo CD, debugging Kubernetes deployments, and cutting observability costs without sacrificing coverage.Already get your weekly CloudPro updates? Packt SysOps keeps you sharp every single day. One quick, practical post every day at 9AM, covering cloud security fixes, Kubernetes tips, DevOps tooling, and scaling lessons from real teams. Follow the page. Stay updated in 2 minutes.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityMicrosoft engineers contributed a new authentication method to Grafana, enabling “managed identity” logins tied to Azure’s identity system. This eliminates the need for credentials or certificate rotation by authenticating users based on identity claims. The change allows Grafana users to mix authentication methods and extends to any OAuth 2.0-based identity provider.How Google Cloud is securing open-source credentials at scaleGoogle Cloud has launched automated scanning for leaked Google Cloud credentials in public open-source artifacts like Docker images and package repositories. The system flags credentials within minutes of publication and alerts users via email or product logs. This aims to reduce cloud breaches from credential leaks, which account for 16% of incidents.Building a cloud security roadmap: Tools by layer and when you need themGrounded Cloud Security published a detailed guide on choosing security tools based on cloud architecture layers: control plane, orchestration, platform, and application. It explains common threats like API key leaks, container misconfigurations, and application exploits, mapping them to tools like CNAPP, CSPM, KSPM, and PAM.Exposing OpenTelemetry Collector Securely with Gateway API and mTLSA new guide explains how to securely expose OpenTelemetry Collectors in Kubernetes using the Gateway API with mutual TLS. This setup helps teams aggregate telemetry from external apps, multi-cluster services, or hybrid environments while enforcing strong authentication. The approach uses Istio’s Gateway API and mTLS to protect gRPC endpoints.AWS published a step-by-step guide on building a secure serverless streaming pipeline using Amazon MSK Serverless and EMR Serverless with IAM authentication. It shows how to ingest data with Kafka, process it via Spark Structured Streaming, and query outputs in S3 using Athena. This design eliminates manual TLS setups, simplifies scaling, and enforces IAM-based access control—ideal for teams seeking managed, low-ops streaming pipelines.A new CLI tool called vet has launched to secure the common curl | bash install pattern. It fetches remote install scripts, shows diffs from previous runs, runs ShellCheck for linting, and requires user approval before execution. vet targets DevOps teams wanting safer automation workflows, reducing risk from blind script execution.⚙️ Infrastructure & DevOpsGoogle Cloud and Docker are simplifying AI app deployment with native support for Docker Compose on Cloud Run. Developers can now use gcloud run compose up to deploy multi-container AI apps from a local compose.yaml file, including GPU-backed models, with one command.Google Cloud detailed strategies for optimizing GKE workload scheduling when resources are tight. Techniques include workload priorities, balloon pods for quick scaling, compute classes for fallback node types, and multi-cluster setups to “capacity chase” across regions. This helps platform teams maintain performance while balancing cost and resource availability.This guide outlines how to deploy a production-ready, self-managed MySQL 8.0 instance on Google Cloud using OpenTofu/Terraform. It emphasizes enterprise-grade practices like secret management with Google Secret Manager, Shielded VM security, automated backups to Cloud Storage, and modular IaC design. Ideal for teams needing fine-grained control over database infrastructure without sacrificing security or operational standards.Simplifying platform engineering at John Lewis - part two | Google Cloud BlogJohn Lewis built a custom Kubernetes controller on Google Cloud to abstract complex Kubernetes configurations for developers. Their Microservice CRD reduces YAML complexity, enforces best practices, and automates features like Prometheus configs and service mesh enrollment.Apptainer, the open-source container platform for HPC environments, has released version 1.4.1 with improved OCI (Open Container Initiative) build support and better integration with BuildKit. It continues to focus on secure, portable containers with an immutable single-file format, supporting GPUs and parallel filesystems.📦 Kubernetes & Cloud NativeA new Inference Extension for Kubernetes Gateway API introduces model-aware traffic routing for LLM and GenAI workloads. It enables smarter request distribution using live model metrics like queue length and GPU load, reducing latency and improving GPU efficiency. Early benchmarks show lower tail latencies compared to standard Kubernetes Services, especially at high QPS levels.What Would a Kubernetes 2.0 Look LikeKubernetes should fix long-standing pain points in a future 2.0 version: ditch YAML for HCL to avoid type errors, replace etcd with pluggable backends like SQLite/Raft for smaller clusters, and introduce a native package manager to replace Helm’s fragile templating. Other ideas include IPv6 by default and simpler networking for more scalable and developer-friendly clusters.How Argo CD Handles 500+ vClusters and Where It BreaksA new deep-dive shows the scaling limits of Argo CD on a control plane managing 1,000 virtual clusters (vClusters) with GitOps. Performance remained stable up to ~500 clusters and ~500 apps, but beyond that, Argo CD controllers hit memory limits and UI became sluggish. The test highlights practical scaling ceilings and tuning tips for multi-tenant GitOps setups on Kubernetes.KubeDiagrams, the open-source tool for generating Kubernetes architecture diagrams, released v0.4.0 with a new --namespace option and improved support for custom resources. It now handles over 47 native Kubernetes types and integrates with Helm, Helmfile, and actual cluster states. This update makes it easier for platform teams to auto-document infrastructure directly from manifests or live clusters.🔍 Observability & SREOllyGarden has introduced the Instrumentation Score, a new open-source standard to measure the quality of OpenTelemetry data. It analyzes telemetry streams against best practices and semantic conventions, giving teams a clear numerical score to assess instrumentation health.A major outage on June 12, 2025, took down Google’s Identity and Access Management (IAM) system, affecting authentication across Firebase and other core services. This follows a similar 2023 incident and highlights risks of central authentication failures in serverless architectures. For cloud teams, it’s a fresh reminder of the need for multi-region failover and alternative authentication strategies.Gigapipe has introduced a fixed-cost observability platform that combines logs, metrics, traces, and profiling into a single backend. It offers compatibility with OpenTelemetry, Loki, Prometheus, Tempo, and Pyroscope without requiring custom agents. This could simplify observability stacks for cloud teams while avoiding variable usage-based costs.Dynatrace now supports querying OpenTelemetry data using natural language via its MCP server and GitHub Copilot integration. Engineers can ask conversational questions in VSCode to retrieve logs, traces, and metrics directly from Dynatrace. This can simplify querying for teams still learning DQL and improve OTel workflows without needing deep query syntax knowledge.InfraSight is a new open-source observability stack using eBPF for real-time syscall tracing on Linux and Kubernetes. It captures events like process execution, file access, and network connections, streaming data to ClickHouse for fast querying. With gRPC pipelines, Kubernetes CRDs, and Helm charts, it aims to simplify low-level infrastructure observability without application changes.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Shreyans from Packt
09 Jun 2025
9 min read
Save for later

Uber built a multi-cloud secrets platform to prevent leaks and automate security at scale

Shreyans from Packt
09 Jun 2025
9 min read
How to Block Up to 95% of Attacks Using AWS WAFCloudPro #95A better way to handle vendor security reviews?If you've ever dealt with vendor onboarding or third-party cloud audits, you know how painful it can be: long email chains, stale spreadsheets, and questionnaires that don’t reflect what’s actually happening in the cloud.We recently came across CloudVRM, and it’s a refreshingly modern take on the problem.Instead of asking vendors to fill out forms or send evidence, CloudVRM connects directly to their AWS, Azure, or GCP environments. It pulls real-time telemetry every 24 hours, flags misconfigs, and maps everything to compliance frameworks like SOC 2, ISO 27001, and DORA.It’s already being used by banks and infra-heavy orgs to speed up vendor approvals by 85% and reduce audit overhead by 90%.Worth checking out if you're building or maintaining systems in regulated environments, or just tired of spreadsheet security.Watch the demoThis week’s CloudPro kicks off with something genuinely useful: a tool that replaces vendor security questionnaires with real-time cloud evidence.📌CloudVRM connects directly to AWS, Azure, or GCP and auto-checks compliance, no spreadsheets, no guesswork📌AWS CloudTrail silently skipping logs if IAM policies get too large (and attackers know it)📌PumaBot is now brute-forcing IoT cameras and stealing SSH credsWe’ve also got sharp engineering writeups: from how Uber rotates 20K secrets a month, to how Netflix handles 140 million hours of viewing data daily, to one team’s story of slicing a $10K Glue bill down to $400 with Airflow.Hope you find something in here that saves you time, money, or migraines.Cheers,Shreyans SinghEditor-in-Chief🔐 Cloud SecurityAWS CloudTrail logging can be bypassed using oversized IAM policiesResearchers at Permiso Security found that AWS CloudTrail fails to log IAM policies between 102,401 and 131,072 characters if they're inflated using whitespace. This gap allows attackers to hide malicious changes from audit logs. The issue stems from undocumented size limits and inconsistent handling of policy data. AWS has acknowledged the problem and plans a fix in Q3 2025.PumaBot targets Linux-based IoT surveillance devices via SSH brute forceA new botnet called PumaBot is targeting IoT surveillance systems by brute-forcing SSH access using IP lists from its command-and-control server. Written in Go, the malware disguises itself as system files, adds persistence through systemd, and installs custom PAM modules to steal credentials. Related binaries in the campaign also auto-update, spread across Linux systems, and exfiltrate login data.How to Block Up to 95% of Attacks Using AWS WAFThis guide explains how to configure AWS Web Application Firewall (WAF) to block threats like SQL injection, XSS, bots, and DDoS attacks with minimal effort. By leveraging pre-built managed rules and setting up a Web ACL, users can protect apps behind ALB, CloudFront, or API Gateway without custom code.CloudPEASS: Toolkit to find and exploit cloud permissions across AWS, Azure, and GCPCloudPEASS helps red teamers and defenders map out permissions in compromised cloud accounts without modifying resources. It supports AWS, Azure, and GCP, detecting privilege escalation paths using API access, brute-force permission testing, and AI-assisted analysis. It also checks Microsoft 365 services in Azure and enables Gmail/Drive token access in GCP.Uber built a multi-cloud secrets platform to prevent leaks and automate security at scaleTo manage over 150,000 secrets across services and vendors, Uber developed a centralized secrets management platform. It blocks leaks in code with Git hooks, scans systems in real time, and consolidates 25 vaults into 6. The platform enables auto-rotation, access tracking, and third-party secret exchange via SSX. It now rotates ~20,000 secrets monthly and is evolving toward secretless auth and workload identity federation.BOOK NOW AT 25% OFF⚙️ Infrastructure & DevOpsAWS Cost Explorer now offers a new Cost Comparison featureAWS launched a new Cost Comparison feature in Cost Explorer that highlights key changes in cloud spend between two months. It automatically identifies top cost drivers, like usage shifts, discounts, or refunds, without needing manual spreadsheets. A new “Top Trends” widget shows the biggest changes at a glance, and deeper insights are now available through the Compare view.Go-based Git Add Interactive tool adds advanced staging and patch filteringThis Go port of git add -i/-p enhances Git’s interactive staging with features like global regex filters, auto-hunk splitting, and multi-mode patch operations (stage, reset, checkout). It supports keyboard shortcuts, color-coded UI, and fine-grained hunk control across all files.GitLab-based monorepo streamlines Terraform module versioning and securityThis setup uses a GitLab CI pipeline to manage Terraform modules in a monorepo, with automated versioning, linting, and security scans via tools like TFLint, tfsec, and Checkov. Git tags handle module versions without extra auth tokens. The workflow enforces changelogs, labels, and approvals, and publishes docs and tags post-merge.A fully automated fix for Terraform’s backend bootstrapping problem on AzureThis guide solves the common issue where Terraform needs a backend to store state, but can’t create it without an existing backend. It automates the creation of an Azure Blob backend using Terraform itself, then seamlessly switches to that backend by generating partial config files and migrating state. The setup includes secure access via managed identity and GitHub OIDC, enabling CI/CD workflows without manual secrets or scripts.Using Terraform to automate disaster recovery infrastructure and failoversThis post explains DR strategies like Pilot Light and Active/Passive, and shows how Terraform enables flexible, cost-efficient deployments using conditionals and modular IaC. A working AWS example demonstrates DNS failover and dynamic EC2 provisioning using a toggle variable. This lets teams switch between production and DR environments with minimal effort, reducing downtime and idle resource costs.📦 Kubernetes & Cloud NativeGateway API v1.3.0 Adds Smart Mirroring and New Experimental ControlsGateway API v1.3.0 is now GA with percentage-based request mirroring, letting teams test blue-green deployments without full traffic duplication. The release also debuts experimental support for CORS filters, retry budgets, and listener merging via new X-prefixed APIs. These features help fine-tune request handling, scale listener configs across namespaces, and manage retry spikes, without upgrading Kubernetes itself.Introducing Gateway API Inference ExtensionThe new Gateway API Inference Extension introduces model-aware routing for GenAI and LLM services running on Kubernetes. It adds InferenceModel and InferencePool resources to better match requests with the right GPU-backed model server based on real-time load. Early benchmarks show reduced latency under heavy traffic compared to standard Services, helping ops teams optimize resource usage and avoid contention.Deep Dive into VPA 1.3.0: Smarter Resource Tuning for Kubernetes PodsThis post explores how the Vertical Pod Autoscaler (VPA) v1.3.0 uses historical and real-time metrics to recommend CPU and memory resource requests. It focuses on the Recommender component, which aggregates usage into decaying histograms to auto-tune workloads and reduce resource waste.Default Helm Charts Leave Kubernetes Clusters at RiskMicrosoft researchers warn that many open-source Helm charts deploy with insecure defaults, exposing services like Apache Pinot, Meshery, and Selenium Grid to the internet without proper authentication. These misconfigurations often include LoadBalancers or NodePorts with no access controls, making them easy targets for attackers. Teams should avoid "plug-and-play" setups and review YAML/Helm configs before deploying to production.Batch Scheduling in Kubernetes: YuniKorn vs Volcano vs KueueKubernetes lacks native support for batch workloads like ML training and ETL jobs, prompting the rise of tools like Apache YuniKorn, Volcano, and Kueue. YuniKorn replaces the default scheduler with strong multi-tenancy support; Volcano focuses on high-performance use cases with gang scheduling; and Kueue integrates natively to manage job queues without altering core scheduling.🔍 Observability & SREWhat's new in Grafana v12.0Grafana v12.0 introduces Git-based dashboard versioning, dynamic layouts, and experimental APIs for managing observability as code. Drilldowns for metrics, logs, and traces are now GA, enabling queryless deep dives across signals. SCIM support simplifies team provisioning, and a new “Recovering” alert state reduces flapping.Sentry Launches Logs in Open Beta to Boost Debugging ContextSentry now supports direct log ingestion in open beta, letting developers view application logs alongside errors and traces in a single interface. This integration adds vital context, like retry attempts or upstream responses, to help identify root causes faster without switching tools.How to use Prometheus to efficiently detect anomalies at scaleGrafana Labs has built and open-sourced an anomaly detection system using only PromQL: no external tools or services required. It computes dynamic bands using rolling averages, standard deviation, and seasonal patterns, with tunable sensitivity and smoothing to reduce false positives. The framework scales across tenants and works with any Prometheus-compatible backend, making it easy to plug into SLO-based alerts for better incident context.Beyond API uptime: Modern metrics that matterTraditional uptime checks fall short in today’s fast-paced environments where even minor API delays can cause major user churn. Catchpoint’s Internet Performance Monitoring (IPM) combines global synthetic tests, percentile-based metrics, and user-centric objectives to detect slowdowns before they escalate. With features like API-as-code, chaos engineering, and CI/CD integration, IPM helps teams catch latency issues early and simulate real-world failures.Microservices Monitoring: Metrics, Challenges, and Tools That MatterMonitoring microservices requires more than just uptime: it demands insight into latency, throughput, error rates, resource use, and inter-service communication. Tools like Middleware, Prometheus-Grafana, and Dynatrace help track these metrics at scale, support alerting, and simplify root cause analysis. Best practices include centralized logging, distributed tracing, automation, and continuous optimization to maintain performance in complex distributed systems.Forward to a Friend📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Shreyans from Packt
23 Jun 2025
11 min read
Save for later

Which call paths dominate at runtime: using Flame Graphs to visualize it!

Shreyans from Packt
23 Jun 2025
11 min read
By Kaiwan N BillimoriaCloudPro #97This week’s CloudPro is a guest special from Kaiwan N Billimoria, the author of Linux Kernel Programming. Kaiwan runs world-class, seriously-valuable, high on returns, technical Linux OS (Corporate and Individual-Online) training programs at https://kaiwantech.com.In today’s issue, Kaiwan walks us through Flame Graphs: a powerful tool to visualize which call paths dominate at runtime and uncover performance bottlenecks.If you want to go deeper, his book Linux Kernel Programming is available for just $9.99 as part of Packt’s Summer Sale.Cheers,Shreyans SinghEditor-in-ChiefGET eBOOK at $9.99P.S. If you’re into platform engineering, check out Platform Weekly: the world’s largest newsletter for platform engineers with 100,000+ readers. Subscribe here.P.P.S. DeployCon is happening June 25. An engineer-first GenAI summit featuring teams from Meta, Tinder, DoorDash, and more. Join in person at the AWS Loft SF or online. Register now.Which Call Paths Dominate at Runtime: Using Flame Graphs to Visualize it!By Kaiwan N BillimoriaAnalyzing workloads is something all engineers end up doing at some point or another (or it’s their job description!). An obvious reason is performance analysis; for example, CPU usage may spike at times, causing issues or even outages.The need of the hour: observe, analyze, and figure out the root cause of the performance issue! Of course, that’s often easier said than done; this kind of work can bog down even experienced professionals...Borrowing from Brendan Gregg’s wonderful presentation (though old, it’s still relevant):In general, answering the ‘Who’ and the ‘How’ are simple(r):‘Who?’: well-known tools like top (and its numerous variants – htop, atop, etc) help answer this question.‘How?’: lots of system monitoring tools are available (vmstat, dstat, sar, nagios, cacti, nmon, iostat, nethogs, sysmon, etc.).The harder questions tend to be the ‘Why?’ and ‘What?’:‘Why?’: by generating a Flame Graph! (the topic of this short article)‘What?’: Flame Graphs as well as plain old perf!The following slide illustrates this (again, from Brendan Gregg):Right. So what the heck’s this Flame Graph thingy? Let’s explore!We’ll abbreviate Flame Graphs as FG.There are several types of FGs (CPU, GPU, memory, off-cpu, etc.); here we keep the focus on just one: CPU FGs via Linux’s powerful perf CPU profiler.The moment a tool can generate profiling data that includes stack traces, it implies that FGs can be generated! Thus, there are several tools besides perf that generate FGs:Windows: WPA, PerfView, Xperf.exeLinux: perf, eBPF, SystemTap, ktapFreeBSD: DTraceMac OS X: InstrumentsWe’ll focus only on using Linux perf; it’s considered one of the best modern CPU profiling tools on the platformMotivation for FGsWith perf, you can indeed profile your workload and see where exactly CPU usage shoots up. It’s easy: record something, get the report, and analyze it (well… it sounds easy at least).Example:Record a system-wide profiling (-a option switch) session with stack chain / backtrace (--call-graph dwarf, old option was -g), frequency of 99 Hz, for 10 seconds:sudo perf record -F 99 -a --call-graph dwarf -- sleep 10(Instead of the -a option switch, you can use the -p PID option to profile a particular process. The generated perf.data file’s owned by root; do a chown to place its ownership under your account if you wish.)Get the perf report:sudo perf report --stdio # or --tui…(Try it!).This begs the question – so why not just use perf? Ah, that’s the thing: on non-trivial workloads, the report can be simply humongous, even going into dozens of (printed) pages! Are you really going to read through all of it, trying to spot the outliers?Visualization with the CPU Flame GraphIt’s why we use the so-called Flame Graph (FG) – to visualize dense textual data and make sense of it; it’s so much clearer (so much more humane, literally).InstallationFirst off, ensure both the perf utility and the FlameGraph scripts are installed.Quick note: to install perf on Ubuntu/Debian, you typically need to be on a distro kernel (not a custom one).Why? Because – unusually for an app – it’s tightly coupled to the kernel it runs on! Assuming you’re on an Ubuntu/Debian distro, do this: sudo apt install linux-perf-$(uname -r) linux-tools-generic (even the linux-tools-generic package might be sufficient).If you’re on a custom-built kernel, build perf (it’s easy): cd <kernel-src-tree>/tools/perf ; make .Install FG from here or do (in an empty folder):git clone --depth 1 https://github.com/brendangregg/FlameGraph.gitSteps to generate a Flame GraphProfile the workload using perf:perf record -F 99 --call-graph dwarf [-a]|[-p pid]-a: all cpus; in effect, if specified, the sample is system-wide-p: sample a particular process.Generates the perf.data binary file.Read from perf.data (default, else use -i <fname>) to convert the binary data to human-readable stack traces via perf script:perf script > perfscript_out.datGenerate the FG, a Scalable Vector Graphic (SVG) file:The FG repo includes several stackcollapse-* scripts; we use the stackcollapse-perf.pl one:cat perfscript_out.dat | FlameGraph/stackcollapse-perf.pl \ | FlameGraph/flamegraph.pl > out.svgOpen the SVG in a web browser, move the mouse over stack frames.A Quick Test RunWe’ll assume you’ve installed both perf and the Flame Graph GitHub repo (the latter under your home dir).Profile: record everything for 10ssudo perf record -F 99 -a --call-graph dwarf -- sleep 10sudo chown ${LOGNAME}:${LOGNAME} perf.dataperf script > perfscript_out.datcat perfscript_out.dat | ~/FlameGraph/stackcollapse-perf.pl |~/FlameGraph/flamegraph.pl > out.svgOpen the SVG file in a web browser. Here’s a screenshot of the Flame GraphHmm, better if we zoom in… so I click on one of the rectangles on the lower-left (say on the gnome-shell one):Ah, better.Interpreting the Flame GraphSome really key points regarding how to interpret the Flame Graph:Each rectangle represents a single stack frame; read it bottom-up.The width is representative of the frequency of the function call.The height is representative of the depth of the stackThe order of rectangles from left-to-right is just alphabetical; it's not a timeline.The colors don’t signify anything special.You can (typically) use the browser Search (Ctrl-F) to search for a function by name.Click on a stack frame (a rectangle) to zoom into that tower. Click Reset Zoom (upper-left corner) to zoom back out.In effect: the hottest code-paths – the ones that dominate - are the widest rectangles!The top-edge – the rectangle at the very top - is the function on-CPU; beneath is ancestry (how it was invoked).Here’s another FG I captured while SSH was running (truncated screenshot showing the interesting portion):Interesting; the “towers” seem to be inverted! Yes, they’ve becomes top-down (downward-growing stacks) instead of bottom-up… they’re called icicles!An option to the perf script command sets this up.A fantastic thing about the FG is that both userspace and kernel-space functions are captured! It’s thus called a mixed-mode FG. For e.g., with the ‘ssh’ FG, you can clearly see the call path leading down to the kernel network protocol stack code – functions from the socket/INET layer sock_*(), followed by L4 tcp_*(), followed by the L3 ip_*() functions; even the invocation of the (network) device transmit – the dev_hard_start_xmit() and others – are visible!My flamegrapher.sh wrapper scriptsNext, to make this a bit easier to use (no need to remember the syntax, easier options), I wrote a wrapper over the original Flame Graph scripts; the top-level one’s named flamegrapher.sh: https://github.com/kaiwan/L5_user_debug/tree/main/flamegraph (it forms a portion of my ‘Linux Userspace Debugging – Tools & Techniques’ training repo).It’s Help screen reveals how you can – very easily! – use it to generate FGs:$ ./flame_grapher.shUsage:flame_grapher.sh -o svg-out-filename(without .svg) [options ...]-o svg-out-filename(without .svg): name of SVG file to generate (saved under /tmp/flamegraphs/)Optional switches:[-p PID]: PID = generate a FlameGraph for ONLY this process or threadIf not passed, the *entire system* is sampled...[-s <style>]: normal = draw the stack frames growing upward [default]icicle = draw the stack frames growing downward[-t <type>]: graph= produce a flame graph (X axis is NOT time, merges stacks) [default]Good for performance outliers (who's eating CPU? using max stack?); works well for multi-threaded appschart= produce a flame chart (sort by time, do not merge stacks)Good for seeing all calls; works well for single-threaded apps[-f <freq>]: frequency (HZ) to have perf sample the system/process at [default=99]Too high a value here can cause issues-h|-?: show this help screen.Note:After pressing ^C to stop, please be patient... it can take a while to process.The FlameGraph SVG (and perf.data file) are stored in the volatile /tmp/flamegraphs dir; copy them to a non-volatile location to save them.Notice a few points:The only mandatory option switch is -o fname; it generates an SVG file named fname.svg.There are two ‘types’ of FG’s we can generate:graph [default]: Produce an FG (X axis is NOT time, merges stacks). This type’s good for performance outliers (who's eating CPU? using max stack?); works well for multi-threaded apps.chart : Produce a flame chart – it’s sorted by time, do not merge stacks. Good for seeing all calls; works well for single-threaded apps.You can optionally specify a particular process (by -p PID) to profile, change the style to icicle, and set the profiling frequency.The metadata and the SVG is stored under /tmp; copy it to a non-volatile location if you want it saved!(Do read README.md as well. Hey, this wrapper’s lightly tested; please help me (and everyone!) out by raising Issues, as and when you come across them!)Tip: Try the speedscope.app site to interact with your FlameGraph!Flame Graphs: Caveats/IssuesFrame Pointers being present helps get good stack traces, BUT the -fomit-frame-pointer is the typical GCC flag passed!Possible exception case is the Linux kernel itself; it has intelligent algorithms to emit accurate stack trace even in the absence of frame pointers.Symbols are required (can use a separate symbol file). A side effect of no symbols may be ill-formed (or close to zero) stack traces.VMs may not support the PMCs (performance measurement counters) that perf requires; in that case, FGs (or perf) don’t really work well.Bonus materialB Gregg’s Linux Performance Observability Tools diagram across the stack!TipsWith [e]BPF becoming a powerhouse for many things, including observability, do look up equivalent eBPF tooling as well: https://www.brendangregg.com/ebpf.html (a similar diagram’s here!).Also be sure to check out B Gregg’s (and others) utility package wrappers: perf-tools and bpfcc-tools.Don’t ignore systemd’s systemd-analyze tool (boot-time).Perf: simply running sudo perf top is itself useful to find outliers; I keep a couple of aliases as well:alias ptop='sudo perf top --sort pid,comm,dso,symbol 2>/dev/null'alias ptopv='sudo perf top -r 80 -f 99 --sort pid,comm,dso,symbol \--demangle-kernel -v --call-graph dwarf,fractal 2>/dev/null'GET Linux Kernel Programming at $9.99What did you think of this special issue📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0