pyspark.resource.ResourceProfile¶
-
class
pyspark.resource.
ResourceProfile
(_java_resource_profile: Optional[py4j.java_gateway.JavaObject] = None, _exec_req: Optional[Dict[str, pyspark.resource.requests.ExecutorResourceRequest]] = None, _task_req: Optional[Dict[str, pyspark.resource.requests.TaskResourceRequest]] = None)[source]¶ Resource profile to associate with an RDD. A
pyspark.resource.ResourceProfile
allows the user to specify executor and task requirements for an RDD that will get applied during a stage. This allows the user to change the resource requirements between stages. This is meant to be immutable so user cannot change it after building.New in version 3.1.0.
Notes
This API is evolving.
Examples
Create Executor resource requests.
>>> executor_requests = ( ... ExecutorResourceRequests() ... .cores(2) ... .memory("6g") ... .memoryOverhead("1g") ... .pysparkMemory("2g") ... .offheapMemory("3g") ... .resource("gpu", 2, "testGpus", "nvidia.com") ... )
Create task resource requasts.
>>> task_requests = TaskResourceRequests().cpus(2).resource("gpu", 2)
Create a resource profile.
>>> builder = ResourceProfileBuilder() >>> resource_profile = builder.require(executor_requests).require(task_requests).build
Create an RDD with the resource profile.
>>> rdd = sc.parallelize(range(10)).withResources(resource_profile) >>> rdd.getResourceProfile() <pyspark.resource.profile.ResourceProfile object ...> >>> rdd.getResourceProfile().taskResources {'cpus': <...TaskResourceRequest...>, 'gpu': <...TaskResourceRequest...>} >>> rdd.getResourceProfile().executorResources {'gpu': <...ExecutorResourceRequest...>, 'cores': <...ExecutorResourceRequest...>, 'offHeap': <...ExecutorResourceRequest...>, 'memoryOverhead': <...ExecutorResourceRequest...>, 'pyspark.memory': <...ExecutorResourceRequest...>, 'memory': <...ExecutorResourceRequest...>}
Attributes
executorResources
- Returns
id
- Returns
taskResources
- Returns