Revision as of 16:40, 30 November 2018

Objectives

LADCO is seeking to understand the best practices for submitting and managing multiprocessor computing jobs on a cloud computing platform. In particular, LADCO would like to develop a WRF production environment that utilizes cloud-based computing. The goal of this project is to prototype a WRF production environment on a public, on-demand high performance computing service in the cloud to create a WRF platform-as-a-service (PaaS) solution. The WRF PaaS must meet the following objectives:

Configurable computing and storage to scale, as needed, to meet that needs of different WRF applications
Configurable WRF options to enable changing grids, simulation periods, physics options, and input data
Flexible cloud deployment from a command line interface to initiate computing clusters and spawn WRF jobs in the cloud

Call Notes

November 28, 2018

WRF Benchmarking

Emulating WRF 2016 12/4/1.3 grids
Purpose for estimating costing for CPUs, RAM and Storage
CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days
RAM: ~22 Gb RAM/run (2.5 Gb/core)
Storage
- test netCDF4 and netCDF with no compression
- with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression)
- need to link in the HDF and NC4 libraries with compression to downstream programs
- estimate about 5.8 Tb for the year, goes to 16.9 Tb without compression

Conceptual Approach to WRF on the Cloud

Cluster management would launch a head node and compute nodes
77 5.5 day chunks, 20 computers for 16 days (or 80 computers for 4 days)
Head node running constantly
Compute nodes running over the length of project
Memory optimized machines performed better than compute optimized for CAMx

Storage Analysis

AWS
- Don't want to use local because it will need to be moved/migrated
- Put the data on a storage appliance (S3) while running, and then push off to longer term storage (Glacier)
- Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes
Azure
- Fast and slower lake storage for offline
- Managed disks for online

Data Transfer Analyis

estimate based on 5.8 Gb
AWS
- Internet transfer will cost ~ $928 for 5.5 Gb
- Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)
Azure
- Online transfer
- Databox option (like snowball)

Cluster Management Tools (interface analysis)

3-4 seemed to work best across several cloud solutions
Alsys Flight (works on AWS and Azure), used to bring up 40 nodes; set up a Tor queuing system; trouble with using an AMI, need to pay for an AMI with this solution; can use Docker if we want to use containers, but Ramboll not positioned to use containers for this project
CFN: slower development, but now has an AWS parallel cluster (CFN reincarnated), improved tools and built in the Python package index (can be installed with PIP); let's you spin everything up from the command line and could be scripted
Haven't yet explored AWS Parallel Cluster/CFN in detail; similar to experience with Star Cluster; seems to be the best solution because you can use your own custom AMI; instance types are independent of the cluster management tools

Next Steps

LADCO to create a WRF AMI on AWS: WRF 3.9.1, netCDF4 with compression, MPICH2, PGI compiler, AMET
LADCO to create a login for Ramboll in our AWS organization
Ramboll to explore AWS Parallel cluster and then prototype with LADCO WRF AMI
Next call 12/5 @ 3 Central

@@ Line 1: / Line 1: @@
+= Objectives =
 LADCO is seeking to understand the best practices for submitting and managing multiprocessor computing jobs on a cloud computing platform. In particular, LADCO would like to develop a WRF production environment that utilizes cloud-based computing. The goal of this project is to prototype a WRF production environment on a public, on-demand high performance computing service in the cloud to create a WRF platform-as-a-service (PaaS) solution. The WRF PaaS must meet the following objectives:
@@ Line 5: / Line 7: @@
 * Flexible cloud deployment from a command line interface to initiate computing clusters and spawn WRF jobs in the cloud
-Call Notes
+= Call Notes =
-* WRF Benchmarking (emulating WRF 2016 12/4/1.3 grids) costing for CPUs, RAM and Storage
+== November 28, 2018 ==
+=== WRF Benchmarking ===
+* Emulating WRF 2016 12/4/1.3 grids
+* Purpose for estimating costing for CPUs, RAM and Storage
 * CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days
 * RAM: ~22 Gb RAM/run (2.5 Gb/core)
-* Storage: test netCDF4 and netCDF no compression; with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression); need to link in the HDF and NC4 libraries with compression to downstream programs; estimate about 5.8 Tb for the year, goes to 16.9 without compression
+* Storage
+** test netCDF4 and netCDF with no compression
+** with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression)
+** need to link in the HDF and NC4 libraries with compression to downstream programs
+** estimate about 5.8 Tb for the year, goes to 16.9 Tb without compression
-Costing analysis
+=== Conceptual Approach to WRF on the Cloud ===
 * Cluster management would launch a head node and compute nodes
-* 77 chunks, 20 computers for 16 days
+* 77 5.5 day chunks, 20 computers for 16 days (or 80 computers for 4 days)
 * Head node running constantly
 * Compute nodes running over the length of project
-* Can probably use 80 computers 4 days insteady of 20 in 16 days
 * Memory optimized machines performed better than compute optimized for CAMx
-* Storage
+=== Storage Analysis ===
+* AWS
 ** Don't want to use local because it will need to be moved/migrated
-** Put the data on a storage appliance (S3) while running, and then push off to longer term storage
+** Put the data on a storage appliance (S3) while running, and then push off to longer term storage (Glacier)
 ** Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes
-* Storage (Azure)
+* Azure
 ** Fast and slower lake storage for offline
 ** Managed disks for online
-* Transfer (estimate based on 5.8 Gb)
+=== Data Transfer Analyis ===
+* estimate based on 5.8 Gb
+* AWS
 ** Internet transfer will cost ~ $928 for 5.5 Gb
 ** Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)
-* Transfer Azure
+* Azure
 ** Online transfer
 ** Databox option (like snowball)
-* Cluster Management Tools (interface analysis)
+=== Cluster Management Tools (interface analysis) ===
-** 3-4 seemed to work best across several cloud solutions
+* 3-4 seemed to work best across several cloud solutions
-** Alsys flight (works on AWS and Azure), used to bring up 40 nodes; set up a tor queuing system; trouble with using an AMI, need to pay for an AMI with this solution; can use Docker if we want to use containers, but Ramboll not positioned to use containers for this project
+* Alsys Flight (works on AWS and Azure), used to bring up 40 nodes; set up a Tor queuing system; trouble with using an AMI, need to pay for an AMI with this solution; can use Docker if we want to use containers, but Ramboll not positioned to use containers for this project
 * CFN: slower development, but now has an AWS parallel cluster (CFN reincarnated), improved tools and built in the Python package index (can be installed with PIP); let's you spin everything up from the command line and could be scripted
 * Haven't yet explored AWS Parallel Cluster/CFN in detail; similar to experience with Star Cluster; seems to be the best solution because you can use your own custom AMI; instance types are independent of the cluster management tools
+=== Next Steps ===
+* LADCO to create a WRF AMI on AWS: WRF 3.9.1, netCDF4 with compression, MPICH2, PGI compiler, AMET
+* LADCO to create a login for Ramboll in our AWS organization
+* Ramboll to explore AWS Parallel cluster and then prototype with LADCO WRF AMI
+* Next call 12/5 @ 3 Central

Difference between revisions of "WRF on the Cloud"

Revision as of 16:40, 30 November 2018

Contents

Objectives

Call Notes

November 28, 2018

WRF Benchmarking

Conceptual Approach to WRF on the Cloud

Storage Analysis

Data Transfer Analyis

Cluster Management Tools (interface analysis)

Next Steps

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools