ec2: elastic compute cloud (virtual host) vpc: virtual private cloud eni: elastic network interface ami: amazon machine image az: availability zone -------------------------------- ebs: elastic block store -- volume ccp level: certified cloud practitioner ec2 instance store -- hardware attached volume (disk) efs Elastic File System -------------------------------- elb: elastic load balancing rds: relational database service idps: intrusion detection and prevention systems tls: transport layer security -- encrypted connections - X.509 sni: server name indication acm: amazon certificate manager asg: auto scaling group -------------------------------- rds: relational database system iam: identification and authentification module (?) -------------------------------- route.53 tld: top-level domains sld: second level domains A AAAA CNAME NS, CAA DS MX NAPTR PTR SOA TXT SPF SRV AWS CloudShell $EC2_AVAIL_ZONE TTL: time to live AWS route.53 specific: Alias record cw: CloudWatch -------------------------------- WhatsTheTime.com -- 3 tier architecture (client, web, database) MyClothes.com stickiness / session affinity ElastiCache (DynamoDB) RDS MyWordPress.com Aurora MySQL EBS EFS -- multiple ENI Instantiating quickly Golden AMI User Data scripts Elastic Beanstalk restore from snapshot Elastic Beanstalk create app - upload version - launch environment - manage environment web tier vs worker tier SQS queue -------------------------------- Amazon S3 buckets ~ top directories -- globally unique name defined at region level namimg convention: no uppercase, no underscore, not an ip, 3-63 long, starts with alphanumeric objects ~ files -- key == full path pre-signed url sse: server-side encryption kms: key management service cmk: customer master key arn: amazon resource name (e.g. this of a kms master key) cors: cross-origin resource sharing preflight request -------------------------------- section 13 -- sdk, iam roles and policies managed policies, attached to user, but also 'custom' (created) inline policies -- added to user, group or role (not recommended) aws policy simulator: https://policysim.aws.amazon.com/home/index.jsp ec2 instance metadata: http://169.254.169.254/latest/meta-data python sdk: boto3 us-east-1 region chosen by default -------------------------------- Section 14: S3 and Athena MFA delete -- root crdentials, from the cli set up a virtual device, under the root account, in the security credentials configure the root account to use the device -- create access keys... aws configure --profile command found under s3-advanced/mfa-delete.sh aws s3api put-bucket-versioning --bucket demo-stephane-mfa-delete-2020 \ --versioning-configuration Status=Enabled,MFADelete=Enabled \ --mfa "arn:aws:iam::001736599714:mfa/root-account-mfa-device 710343"\ --profile root-mfa-delete-demo Note: special bucket name (he just wanted to enable this on a temp bucket?) 'mfa-code' (710343) got directly from the (mfa) application... Disable MFA-delete: same command with 'Disabled' S3 access logs: do not set the logging biucket as the one monitored Replication: CRR & SRR cross-region and same region enable versioning other account OK delete operations with a version not replicated no chaining under 'management' pre-signed urls -- sdk or cli, valid for 1 hour in the 'code', under s3-advanced, pre-signed-url.sh in case of errors: aws configure set default.s3.signature_version s3v4 storage classes: - s3 standard - general purpose - s3 standard infrequent access (IA) - s3 one zone infrequent access (e.g. for thumbnails -- can re-create) - s3 intelligent tiering - glacier - glacier deep archive lifecycle rules to 'transition' storage classes, or 'expire' objects analytics -- just for standard to standard-IA performance -- per 'prefix' bucket(/prefix/)object kms limitaion -- service quota console to upgrade multi-part upload recommended > 100 MB -- mandatory for > 5 GB edge locations -> accelerate transfers byte range -> speed up downloads select & glacier select -- SQL filter (server side) event notifications -- rules (in properties) sns simple notification service sqs queue (policy!) lambda versioning required requester pays (optional) Athena serverless query service to performa analytics, using sql (built on Presto) create database -- again in 'code' SQL queries (select, group...) Glacier Vault Lock -- WORM: write once, read many also for s3: object lock governance and compliance modes... (cannot be changed) -------------------------------- Section 15 CloudFront and accelerator edge locations -- aws vpn, cached locally CDN: content delivery network OAI: origin access identity ec2 as origin: must be public allow all public IPs of edge locations global accelerator unicast vs anycast IP aws shield -- protection against ddos -------------------------------- Section 16 Storage snow -- devices sent (snowball edge, snowcone, snowmobile) edge computing OpsHub glacier: only from s3 FSx third party high performance file systems (e.g. NetApp ONTAP) Windows (single or multi AZ) (same as EFS but for Windows) Lustre (Linux & cluster) large-scale computing HPC: High Performance Computing scratch or persistent fs (single AZ) hybrid cloud aws storage gateway s3 efs file, volume, or tape gateway file: nfs/smb volume: iSCSI VTL virtual tape library -- glacier AWS Transfer using ftp EBS and Instance storage: only for one EC2 -------------------------------- Section 17 SQS, SNS, Kinesis, Active MQ asynchronous models: - sqs: queue - sns: pub/sub - kinesis: streaming sqs: standard queue service (producers, consumers, messages, send/poll) at least (possible duplication of messages) best effort ordering (message max size: 256kb) retention time (up to 14 days) before being dropped from the queue AWS Lambda: serverless service DeleteMessage API ASG auto scaling group CloudWatch metric: queue length SQS access policies (e.g. for cross-account) Message visibility timeout (30s default) ChangeMessageVisibility API DLQ: Dead Letter Queue: MaximumReceives threshold DelayQueue Long Polling Request/Response queues -> SQS Temporary Queue Client FIFO queue Limited throughput SQS Auto Scaling Group: need a custom metric decouple producer and consumer (independent scaling groups) SNS: Simple Notification Service -- many receivers Notification publisher/subscriber CloudFormation may be a client of SNS SNS + SQS: Fan Out pattern. e.g. S3 Events Message Filtering Kinesis - KDS: Data Streams - "shards" (for scaling) - 1MB or 1000 msg /s per shard Retension, immutability - Data Firehose - batch writes (dest: AWS S3/Redshift/ElasticSearch) - Data Analytics (SQL application) AMZ OpenSearch Service aka ElasticSearch AMZ MQ -- managed Apache ActiveMQ active / standby storage with EFS, for both, for failover -------------------------------- Section 18: Containers on AWS: ECS, Fargate, ECR &EKS ECR: Elastic Container Registry Hypervisor AWS Fargate serverless AWS EKS ECS: service -- have to provision the EC2 instances Launch types: 1. Amazon EC2 Launch Type: EC2 instances in one same region, cluster, VPC Run ECS agent in all Container instances 2. Fargate -- serverless many (Task + ENI) items IAM roles for ECS tasks - EC2 Instance Profile, used by the ECS agent - ECS Task Role ECS Data Volumes: integration with EFS Mount volumes to instances or tasks Load Balancer for the EC2 instance. Random ports assigned to the tasks known to the LB Allow any port from the lb LB for Fargate: same port for every taks Event Bridge integration ECS Scaling 2 levels of scaling ECS Rolling Update, with Min/Max ECR: vulnerability scanning, S3 backed CodeBuild CICD to automate build/push/poll of images EKS alernate to ECS, open source API Pods running on nodes -------------------------------- Section 19: Serverless Overviews: Lambda, API Gateway, DynamoDB, Cognito FaaS: function as a service (originally) AWS Lambda: on-demand, short execution Lambda Container image url: s:/discover:/begin: limits per region Lambda@Edge in CloudFront: modify Viewer|Origin Request|Response DynamoDB: Global no sql DB, replication across multiple AZ Tables, Primary key, row, attributes; max item size: 400KB Read/Write Capacity Modes: Provisioned / On-Demand Provisioned: RCU/WCU: read/write capacity unit On-Demand: for unpredictable workload DAX: DynamoDB Accelerator (cache) Streams ordered stream (log) of CRUD modifications Global Tables -- two-way (read/write) replication across regions TTL ExpTime Indexes GSI/LSI -- query on other than primary keys Transactions (atomic) API Gateway: invoke lambda functions via REST API EndPoint types: Edge-Optimized / Regional / Private API: HTTP, WebSocket, REST Permissions IAM: Sig v4 Lambda authorizer -- cached Cognito User Pools -- authentication, not authorization Cognito - CUP: User Pools -> send back JSON Web token JWT - Identity Pools (Federated Identity) - Sync STS temporary credentials SAM: Serverless Application Model -- yaml -------------------------------- Section 20: Serverless Solution Architecture Discussions - Mobile Application: MyTodoList REST https: API GW + Lambda + DynamoDB + Cognito AWS STS + S3 DAX caching - Hosted website: MyBlob.com welcome email plus thumnail S3 + CloudFront (Edge) + OAI bucket policy API GW + lambda + DAX + DynamoDB (Global) + Stream/Lambda + SES SES: Simple Email Service optional: SQS / SNS - Micro Services Architecture each one with a REST API Elastic LB + ECS + DynamoDB Route 53 CIDR: Classless Inter-Domain Routing, e.g. 192.0.2.0/24 API GW + Lambda + ElastiCache Elastic LB + EC2 Auto Scaling + RDS Synchronous patterns -> API GW, LB Asynchronous: SQS, Kinesis, SNS, Lambda... - Distributed paid content Signed URLs generated with API GW + Lambda + DynamoDB Cognito - Software updates offloading ALB + EC2 + EFS -> CloudFront - Big Data Ingestion Pipeline IOT Devices > IOT Core -> Kinesis Streams -> Firehose -> (Ingestion) S3 Lambda SQS + Lambda + Athena + S3 QuickSight or RedShift -------------------------------- Section 21: Databases in AWS questions: - read heavy? Or balanced? Throughput? Fluctuating during the day? - how much data for how long to save? Average object size? Access? - Data durability? Source of Truth? - Latency requirements? Concurrent users? - Data model? How to query? Joins? Structured? - Schema? Search? - License costs? RDBMS (=SQL/OLTP): RDS, Aurora -- joins NOSQL: DynamoDB, ElastiCache, Neptune - no =sql, no join Object Store: S# (big objects) / Glacier (backup, archive) Data Warehouse (analysis): Redshift(OLAP), Athena Search: ElasticSearch Graphs: Neptune (relationships between data) - RDS: several supported, provision an EC2 instance and EBS volume OLTP: Online transaction processing Operations: small downtimes Security: configure Reliability: Multi AZ if enabled Performance: depemds on EC2 type and EBS volume type. Flexible. Storage auto-scaling Cost: pay per hour - Aurora: compatible API with PostgreSQL / MySQL Data in 6 replicas in 3 AZ Auto healing Read replica may be global DB may be global for DR or latency Autoscale serverless and multimaster options - ElastiCache Redis, Memcached Must provision EC2 instance sharding Key/Value store Security: no IAM - DynamoDB proprietary, noSQL serverless can replace ElastiCache (not quite as fast, except with DAX) Query on primary or sort keys only small documents - S3 big objects - Athena SQL layer on top of S3, Presto engine, for logs etc. - Redshift: PostgreSQL for analytics MPP: Massively Parallel Query Exceution two types of nodes: 'Leader' and 'Compute' Only one AZ -- but snapshots Data from: Kinesis FireHose / S3 using COPY / EC2 instance through JDBC driver Spectrum: thousands of nodes launched to run queries in S3, without loading the data faster than Athena -- indexes - Glue. ETL: Extract, Transform and Load from S3 or RDS Data Catalog -- Crawlers to populate the catalog from databases -- (EMR?) - Neptune: graphs 3 AZ - OpenSearch aka ElasticSearch Complement to other DB (e.g. DynamoDB) Kibana and Logstash -- ELK stack Also Cognito -------------------------------- Section 22: Monitoring & audit: CloudWatch, CloudTrail & Config Metrics, namespaces, Dimension Custom metrics can be pushed in the (recent) past (?) CLI Dashboards: can be from different accounts and regions Logs: groups, streams, expiration policies Exporting, CreateExportTask Sources Filter & Insights, trigger alarms Log Aggregation CloudWatch Agent on EC2 instance or on on-premise server Also Unified Agent (metrics and logs) Test from CLI -- set-alarm-state EC2 instance recovery: same IP etc. Events: intercept some patterns, schedule... Now: EventBridge -- multiple buses also from outside AWS Schemas may be versioned CloudTrail: governance, compliance, and audit (API calls) - management events - data events (not logged by default) - insights events (need to enable and pay for it) Retension: 90 days by default AWS Config -- per region Remediations using SSM Automation documents -------------------------------- Section 23: IAM STS: Security Token Service -- temporary access AssumeRole -- but prefer Cognito for 3rd party identity GetSessionToken for MFA Identity Federation and Cognito user management outside AWS SAML 2.0 (Security Assertion Markup Language)-- Active Directory (LDAP), ADFS New way: Amazon SSO Custom Identity Broker Application Cognito AWS Directory Sevices: - Managed Microsoft AD -- supports MFA - AD Connector - Gateway (proxy) - Simple AD Organizations - one master, many members Multi account strategies Organizational Units (OU) SCP: Service Control Policies whiltelist or blacklist, at the OU or account level - not master Moving Accounts: remove from old, sned invite, accept the invite Conditions: SourceIp, RequestedRegion, Tags, MFA IAM for S3: Action; Bucket or Object level Role vs Resource Based Policy Permissions Boundary (only applies to user or role, not to group) RAM: Resource Access Manager -- share some resources AWS SSO -------------------------------- Section 24: AWS Security, Encryption, etc. SSL in flight -- SSL Certificates MITM: Man in the Middle Server side encryption 'at rest' Client side encryption -- Envelope Encryption KMS: Key Management Service CMK: Customer Master Key - symmetric AES-256 - asymmetric (RSA & ECC) KMS pay service - esp for user keys (managed free) Only <4KB -- more: envelope KMS per region Key Policies Custom policy: define user and roles who can access snippet under code/kms Key Rotation -- 1 year by default, but manual possible SSM Parameter Store (hierachical) Parameter Policies -- TTL Systems Manager / Parameter Store (in the menu) CLI aws get-parameter* Secrets Manager -- integration with RDS etc. Alternative to Parameter Store CloudHSM provision encryption hardware Shield -- DDOS - Standard, free, activated by default - Advanced. DRP: DDOS Response Team WAF: Web Application Firewall Only for: Application Load Balancer, API Gateway, CloudFront -> Define Web ACL GuardDuty intelligent threat discovery -- Machine Learning also CryptoCurrency Attacks (?) Inspector -- automated security assessments for EC2 instances -> install the Inspector Agent on the EC2 instance Macie: managed data security and data privacy service PII: Personally Identifiable Information Shared Responsibility Model -- Security of the Cloud -------------------------------- Section 25: Networking VPC CIDR, IANA private: - 10.0.0.0/8 (big nw) - 172.16.0.0/12 AWS default - 192.168.0.0/16 e.g. home nw all the rest: public! https://www.ipaddressguide.com/ There is a default vpc, but create your own! AWS reserves 5 addresses in every subnet - 0 nw address - 1 vpc router - 2 DNS - 3 future - 255 broadcast address -- not supported IGW: Internet Gateway public subnet: enable auto-assign public ip bastion host -- hop through NAT instance -- Nw Address Translation Fixed Elastic IP source/destination check (disable) NAT Gateway private subnet -> NATGW (in the public subnet) -> IGW DNS Resolution enableDnsSupport: Route 53 Resolver supported enableDnsHostnames: hostnames for private IP addresses NACL & Security Groups -- stateful (SG) vs stateless (NACL) default NACL: open -- don't modify: create another Ephemeral ports -- need to open full port ranges! index rule precedence VPC Reachability Analyzer VPC Peering -- not transitive; need to add routes VPC Endpoints aka AWS PrivateLink - Interface endpoint -- ENI (private IP), must attach Security Group - Gateway endpoint -- S3 and DynamoDB (per region!) VPC Flow Logs: S3 or CloudWatch Site-to-Site VPN VGW: Virtual Private Gateway CGW: Customer Gateway (on premises) enable ICMP for ping VPN CloudHub -- hub-and-spoke model - over the public internet BGP: Border Gateway Protocol Direct Connect (DX) Direct Connect Endpoint, Customer or partner router Direct Connect Gateway Private virtual interface Lead times often longer than one month to establish IPsec encrypted private connection -- as not encrypted by default High Resiliency for Critical Workloads Maximum R. 2x2 connections PrivateLink -- VPC Endpoint Services expose a service to 1000s of VPC bw Network LB and ENI (in Customer VPC) EC2 Classic and ClassicLink (deprecated) Transit Gateway -- hub-and-spoke (star) connection - Supports IP Multicast ECMP: Equal Cost multi-path Traffic Mirroring -- Source ENI, Targets (ENI or NLB) IPv6 3.4 x 10^38 unique IP addresses (8xhex4) -- dual-stack mode Egress-only Internet Gateway for IPv6 ::/0 Billing dashboard CIDR, VPC, subnet, Internet GW, route table, bastion, NAT, DNS, NACL, SG, Reachability Analizer, VPC Peering, Endpoints, Flow Logs, Site-to-Site VPN, VPN CloudHub, Direct Connect, DC GW, PrivateLink, Transit GW, Traffic Mirroring, Egress-only Internet GW for IPv6 15/22... -Jan 23------------------------------- Networking costs per GB - incoming traffic to EC2: free - private IP bw EC2 in same AZ: free - EC2 bw AZ, public IP or elastic IP: cost; private IP: half the price - EC2 bw regions: cost egress: cost, ingress: free direct connect in same region: minimized S3 - ingress free - egrep cost - transfer accelerator: additional cost - s3 to cloudfront: free - cloudfront egress cheaper - cross-region replication: cost nat gw vs gw VPC endpoint - ec2 -> nat gw -> internet gw -> internet: 4xcost - vpd endpoint -> s3: much cheaper -------------------------------- Section 26: Disaster Recovery and migrations - on premise -> on premise: expensive - on premise -> AWS cloud: hybrid - AWS reg a -> aws reg b RPO: Recovery Point Objective: backup frequency, i.e. size of data loss RTO: Recovery Time Objective: downtime for recovery - backup & restore - pilot light - warm standby - hot site / multisite approach faster RTO 1. AWS snowball: ~one week RPO snapshots: ~daily 2. small version always running in the cloud -- critical core (e.g. rds) 3. up and running in the cloud, at minimum size 4. expensive, but low rpo/rto -- esp. with all aws Tips: backup: snapshots, glacier, snowball or stg gw high availability: route53, rds multiAZ, efs, s3, direct connect replication: aurora, global db, stg gw automation: cloudformation, beanstalk, cloudwatch, lambda chaos testing: netflix simian army -- random Database Migration: DMS resilient, self-healing homogeneous or heterogeneous continuous data replication: CDC -- Change Data Capture need an EC2 sources and targets: s: on premise, azure, rds, auroa s3... t: on premise, rds, redshift, dynamodb, S3, elasticsearch, kinesis, documentdb SCT: Schema Conversion Tool DMS+SCT -- postgress: no need for SCT setup SCT on premise Data migration from (e.g. Oracle) DB to DMS replication instance (+CDC) -> RDS for MySQL (with schema conversion from SCT) On-Premise Strategies - download AMI as a VM (.iso format) into vmware, KVM, VirtualBox (Oracle), MS Hyper-V VM Import/Export into EC2, e.g. for DR recovery Application Discovery Service: plan a migration AWS Migration Hub AWS DMS (Migration) replicate Server Migration Service SMS -- incremental live AWS DataSync -- large amount of data (also from NAS via NFS of SMB) agent on premise, DataSync Sevice in the region every hour or day -- not continuous Also EFS to EFS bw regions Summary - 200TB with 100Mbps connection: 185d - direct connect 1Gbps 1 month setup, 18.5d - snowball: 2 to 3 in parallel. ~ 1 week - on-going replication: site to site, DX, DMS, DataSync AWS Backup -- managed - PITR (for supported services) Point in Time Recovery - On-Demand and Scheduled - Tag based policies - Backup Plans (frequency, wimdow, etc) -Jan23------------------------------- Section 27 Lambda, SNS, SQS lambda polls DLQ dead letter queue SQS FIFO + lambda: blocking -> DLQ SNS + Lambda: asynchronous, retry + discard or DLQ Fan Out Pattern: multiple SQS: combine SQS with an SNS (subscriber SQS) S3 Events event types, possible object filtering Use case: thumbnail S3 event notifications can take longer, and may be sent multiple times Enable versioning Caching Strategies CloudFront caching at the edge -- TTL (balancing) API GW also caches (regional) Redis, memcached or DAX caching of the app logic Blocking an IP address NACL Network ACL -- deny rule SG of the EC2: only allow rules Optional FW in EC2 (CPU cost) ALB in between: Connection termination NLB: no SG (passthrough) ALB + WAF web application FW: complex filtering, but expensive CloudFront outside the VPC (public IP): Geo retriction or WAF HPC: high performance computing - very high number of resources -- pay on use genomics, chemistry, etc. Data Management & Transfer AWS DX, snowball, DataSync Compute & networking: EC2 cpu/gpu optimized, spot instances placement groups (same rack/AZ) EC2 Enhanced NW SR-IOV ENA elastic nw adapter up to 100 Gbps Intel legacy EFA: elastic fabric adapter (Linux) -- MPI: Message Passing Interface Storage: - EBS - instance store Network storage: S3, EFS, provisioned, FSx for Lustre Automation and orchestration: AWS Batch or Parallel Cluster (HPC on AWS) High available EC2 Public EC2 instance, elastic IP standby EC2 instance - failover: cloudwatch monitoring (alarm/metric) -> lambda function: start the standby, attach the elastic IP to it - ASG Autoscaling group >= 2 AZ ASG + EBS volume (locked into an AZ) -> snapshot + tags -> create a new EBS in the other AZ Bastion host HA VPC with 2 AZ, public subnet / private subnet NLB accessing a 2nd bastion host NLB layer 4: TCP (not only layer 7 http) -Jan24------------------------------- Section 20: Other Services CICD Introduction GitHub, BitBucket, CodeCommit CodeBuild, Jenkins Deploy often Automated Deployment CodeDeploy, Jenkins CD, Spinnaker Code build test deploy provision CodeCommit, CodeBuild Eastic BeansTalk or CodeDeploy/CloudFormation AWS CodePipeline orchestration Infrastructure as Code: CloudFormation Declarative way to outline AWS infrastructure No resource manually created Each resource tagged -- estimate costs Productivity Seaparation of concerns: stacks, apps, layers Templates uploaded in S3 Stacks identified by name CloudFormation Designer Templates in YAML resources declared (manadatory) StackSet: all stacks created, updated, or deleted in parallel Step functions: serverless visual workflow to orchestrate lambda functions JSON state machine SWF: Simple Workflow Serivice -- code runs on EC2 (older than step functions) external signals or child processes EMR: Elastic MapReduce -- Hadoop Clusters (Big Data) analyze and process data Hbase, Apache Spark, Presto... OpsWorks: Chef & puppet -- managed service (alternative to SSM system manager) 'Recipes' == 'Manifests' AWS WorkSpaces: VDI Virtual Desktop Infrastructure, Secure Cloud Desktop On Demand, pay by use AppSync Store and Sync data across mobile and web apps Uses GraphQL (from Facebook) Alternative to Cognito Sync Cost Explorer: manage AWS costs ECS: Elastic Container Service -- docker ECR: Elastic Container Registry -- repo Glue: ETL (Extract Transform Load) service