imagenet_classification¶
Benchmark the performance of loading images from local file systems and classifying them using a GPU.
This script builds the data loading pipeline and instantiates an image classification model in a GPU. The pipeline transfer the batch image data to the GPU concurrently, and the foreground thread run the model on data one by one.
A file list can be created, for example, by:
cd /data/users/moto/imagenet/
find val -name '*.JPEG' > ~/imagenet.val.flist
To run the benchmark, pass it to the script like the following.
python imagenet_classification.py
--input-flist ~/imagenet.val.flist
--prefix /data/users/moto/imagenet/
Source¶
Source
Click here to see the source.
1# Copyright (c) Meta Platforms, Inc. and affiliates.
2# All rights reserved.
3#
4# This source code is licensed under the BSD-style license found in the
5# LICENSE file in the root directory of this source tree.
6
7"""Benchmark the performance of loading images from local file systems and
8classifying them using a GPU.
9
10This script builds the data loading pipeline and instantiates an image
11classification model in a GPU.
12The pipeline transfer the batch image data to the GPU concurrently, and
13the foreground thread run the model on data one by one.
14
15.. include:: ../plots/imagenet_classification_chart.txt
16
17A file list can be created, for example, by:
18
19.. code-block:: bash
20
21 cd /data/users/moto/imagenet/
22 find val -name '*.JPEG' > ~/imagenet.val.flist
23
24To run the benchmark, pass it to the script like the following.
25
26.. code-block::
27
28 python imagenet_classification.py
29 --input-flist ~/imagenet.val.flist
30 --prefix /data/users/moto/imagenet/
31"""
32
33# pyre-ignore-all-errors
34
35import contextlib
36import logging
37import os.path
38import re
39import time
40from collections.abc import Awaitable, Callable, Iterator
41from pathlib import Path
42
43import spdl.io
44import spdl.utils
45import torch
46from spdl.dataloader import Pipeline, PipelineBuilder
47from torch import Tensor
48from torch.profiler import profile
49
50_LG = logging.getLogger(__name__)
51
52
53__all__ = [
54 "entrypoint",
55 "benchmark",
56 "source",
57 "get_decode_func",
58 "get_pipeline",
59 "get_model",
60 "ModelBundle",
61 "Classification",
62 "Preprocessing",
63 "get_mappings",
64 "parse_wnid",
65]
66
67
68def _parse_args(args):
69 import argparse
70
71 parser = argparse.ArgumentParser(
72 description=__doc__,
73 formatter_class=argparse.RawDescriptionHelpFormatter,
74 )
75 parser.add_argument("--debug", action="store_true")
76 parser.add_argument("--input-flist", type=Path, required=True)
77 parser.add_argument("--max-samples", type=int, default=float("inf"))
78 parser.add_argument("--prefix", default="")
79 parser.add_argument("--batch-size", type=int, default=32)
80 parser.add_argument("--trace", type=Path)
81 parser.add_argument("--queue-size", type=int, default=16)
82 parser.add_argument("--num-threads", type=int, default=16)
83 parser.add_argument("--no-compile", action="store_false", dest="compile")
84 parser.add_argument("--no-bf16", action="store_false", dest="use_bf16")
85 parser.add_argument("--use-nvdec", action="store_true")
86 parser.add_argument("--use-nvjpeg", action="store_true")
87 args = parser.parse_args(args)
88 if args.trace:
89 args.max_samples = args.batch_size * 60
90 return args
91
92
93# Handroll the transforms so as to support `torch.compile`
94class Preprocessing(torch.nn.Module):
95 """Perform pixel normalization and data type conversion.
96
97 Args:
98 mean: The mean value of the dataset.
99 std: The standard deviation of the dataset.
100 """
101
102 def __init__(self, mean: Tensor, std: Tensor) -> None:
103 super().__init__()
104 self.register_buffer("mean", mean)
105 self.register_buffer("std", std)
106
107 def forward(self, x: Tensor) -> Tensor:
108 """Normalize the given image batch.
109
110 Args:
111 x: The input image batch. Pixel values are expected to be
112 in the range of ``[0, 255]``.
113 Returns:
114 The normalized image batch.
115 """
116 x = x.float() / 255.0
117 return (x - self.mean) / self.std
118
119
120class Classification(torch.nn.Module):
121 """Classification()"""
122
123 def forward(self, x: Tensor, labels: Tensor) -> tuple[Tensor, Tensor]:
124 """Given a batch of features and labels, compute the top1 and top5 accuracy.
125
126 Args:
127 images: A batch of images. The shape is ``(batch_size, 3, 224, 224)``.
128 labels: A batch of labels. The shape is ``(batch_size,)``.
129
130 Returns:
131 A tuple of top1 and top5 accuracy.
132 """
133
134 probs = torch.nn.functional.softmax(x, dim=-1)
135 top_prob, top_catid = torch.topk(probs, 5)
136 top1 = (top_catid[:, :1] == labels).sum()
137 top5 = (top_catid == labels).sum()
138 return top1, top5
139
140
141class ModelBundle(torch.nn.Module):
142 """ModelBundle()
143
144 Bundle the transform, model backbone, and classification head into a single module
145 for a simple handling."""
146
147 def __init__(self, model, preprocessing, classification, use_bf16):
148 super().__init__()
149 self.model = model
150 self.preprocessing = preprocessing
151 self.classification = classification
152 self.use_bf16 = use_bf16
153
154 def forward(self, images: Tensor, labels: Tensor) -> tuple[Tensor, Tensor]:
155 """Given a batch of images and labels, compute the top1, top5 accuracy.
156
157 Args:
158 images: A batch of images. The shape is ``(batch_size, 3, 224, 224)``.
159 labels: A batch of labels. The shape is ``(batch_size,)``.
160
161 Returns:
162 A tuple of top1 and top5 accuracy.
163 """
164
165 x = self.preprocessing(images)
166
167 if self.use_bf16:
168 x = x.to(torch.bfloat16)
169
170 output = self.model(x)
171
172 return self.classification(output, labels)
173
174
175def _expand(vals, batch_size, res):
176 return torch.tensor(vals).view(1, 3, 1, 1).expand(batch_size, 3, res, res).clone()
177
178
179def get_model(
180 batch_size: int,
181 device_index: int,
182 compile: bool,
183 use_bf16: bool,
184 model_type: str = "mobilenetv3_large_100",
185) -> ModelBundle:
186 """Build computation model, including transfor, model, and classification head.
187
188 Args:
189 batch_size: The batch size of the input.
190 device_index: The index of the target GPU device.
191 compile: Whether to compile the model.
192 use_bf16: Whether to use bfloat16 for the model.
193 model_type: The type of the model. Passed to ``timm.create_model()``.
194
195 Returns:
196 The resulting computation model.
197 """
198 import timm
199
200 device = torch.device(f"cuda:{device_index}")
201
202 model = timm.create_model(model_type, pretrained=True)
203 model = model.eval().to(device=device)
204
205 if use_bf16:
206 model = model.to(dtype=torch.bfloat16)
207
208 preprocessing = Preprocessing(
209 mean=_expand([0.4850, 0.4560, 0.4060], batch_size, 224),
210 std=_expand([0.2290, 0.2240, 0.2250], batch_size, 224),
211 ).to(device)
212
213 classification = Classification().to(device)
214
215 if compile:
216 with torch.no_grad():
217 mode = "max-autotune"
218 model = torch.compile(model, mode=mode)
219 preprocessing = torch.compile(preprocessing, mode=mode)
220
221 return ModelBundle(model, preprocessing, classification, use_bf16)
222
223
224def source(
225 path: Path,
226 prefix: str = "",
227 max_samples: int = float("inf"),
228) -> Iterator[tuple[str, int]]:
229 """Iterate a file containing a list of paths.
230
231 Args:
232 path: Path to the file containing list of file paths.
233 prefix: Prepended to the paths in the list.
234 max_samples: Maximum number of samples to yield.
235
236 Yields:
237 The path of the image and its class label.
238 """
239 class_mapping = get_mappings()
240
241 with open(path) as f:
242 i = 0
243 for line in f:
244 if line := line.strip():
245 path_ = prefix + line
246 label = class_mapping[parse_wnid(path_)]
247 yield path_, label
248 if (i := i + 1) >= max_samples:
249 return
250
251
252def get_decode_func(
253 device_index: int,
254 width: int = 224,
255 height: int = 224,
256) -> Callable[[list[tuple[str, int]]], Awaitable[tuple[Tensor, Tensor]]]:
257 """Get a function to decode images from a list of paths.
258
259 Args:
260 device_index: The index of the target GPU device.
261 width: The width of the decoded image.
262 height: The height of the decoded image.
263
264 Returns:
265 Async function to decode images in to batch tensor of NCHW format
266 and labels of shape ``(batch_size, 1)``.
267 """
268 device = torch.device(f"cuda:{device_index}")
269
270 filter_desc = spdl.io.get_video_filter_desc(
271 scale_width=256,
272 scale_height=256,
273 crop_width=width,
274 crop_height=height,
275 pix_fmt="rgb24",
276 )
277
278 async def decode_images(items: list[tuple[str, int]]):
279 paths = [item for item, _ in items]
280 labels = [[item] for _, item in items]
281 labels = torch.tensor(labels, dtype=torch.int64).to(device)
282 buffer = await spdl.io.async_load_image_batch(
283 paths,
284 width=None,
285 height=None,
286 pix_fmt=None,
287 strict=True,
288 filter_desc=filter_desc,
289 device_config=spdl.io.cuda_config(
290 device_index=0,
291 allocator=(
292 torch.cuda.caching_allocator_alloc,
293 torch.cuda.caching_allocator_delete,
294 ),
295 ),
296 )
297 batch = spdl.io.to_torch(buffer)
298 batch = batch.permute((0, 3, 1, 2))
299 return batch, labels
300
301 return decode_images
302
303
304def _get_experimental_nvjpeg_decode_function(
305 device_index: int,
306 width: int = 224,
307 height: int = 224,
308):
309 device = torch.device(f"cuda:{device_index}")
310 device_config = spdl.io.cuda_config(
311 device_index=device_index,
312 allocator=(
313 torch.cuda.caching_allocator_alloc,
314 torch.cuda.caching_allocator_delete,
315 ),
316 )
317
318 async def decode_images_nvjpeg(items: list[tuple[str, int]]):
319 paths = [item for item, _ in items]
320 labels = [[item] for _, item in items]
321 labels = torch.tensor(labels, dtype=torch.int64).to(device)
322 buffer = await spdl.io.async_load_image_batch_nvjpeg(
323 paths,
324 device_config=device_config,
325 width=width,
326 height=height,
327 pix_fmt="rgb",
328 # strict=True,
329 )
330 batch = spdl.io.to_torch(buffer)
331 return batch, labels
332
333 return decode_images_nvjpeg
334
335
336def _get_experimental_nvdec_decode_function(
337 device_index: int,
338 width: int = 224,
339 height: int = 224,
340):
341 device = torch.device(f"cuda:{device_index}")
342 device_config = spdl.io.cuda_config(
343 device_index=device_index,
344 allocator=(
345 torch.cuda.caching_allocator_alloc,
346 torch.cuda.caching_allocator_delete,
347 ),
348 )
349
350 async def decode_images_nvdec(items: list[tuple[str, int]]):
351 paths = [item for item, _ in items]
352 labels = [[item] for _, item in items]
353 labels = torch.tensor(labels, dtype=torch.int64).to(device)
354 buffer = await spdl.io.async_load_image_batch_nvdec(
355 paths,
356 device_config=device_config,
357 width=width,
358 height=height,
359 pix_fmt="rgba",
360 strict=True,
361 )
362 batch = spdl.io.to_torch(buffer)[:, :-1, :, :]
363 return batch, labels
364
365 return decode_images_nvdec
366
367
368def get_pipeline(
369 src: Iterator[tuple[str, int]],
370 batch_size: int,
371 decode_func: Callable[[list[tuple[str, int]]], Awaitable[tuple[Tensor, Tensor]]],
372 concurrency: int,
373 buffer_size: int,
374 num_threads: int,
375) -> Pipeline:
376 """Build image data loading pipeline.
377
378 The pipeline uses the ``decode_func`` for decoding images concurrently and
379 send the resulting data to GPU.
380
381 Args:
382 src: The source of the data. See :py:func:`source`.
383 batch_size: The number of images in a batch.
384
385 """
386 return (
387 PipelineBuilder()
388 .add_source(src)
389 .aggregate(batch_size, drop_last=True)
390 .pipe(decode_func, concurrency=concurrency)
391 .add_sink(buffer_size)
392 .build(num_threads=num_threads)
393 )
394
395
396def benchmark(dataloader: Iterator[tuple[Tensor, Tensor]], model: ModelBundle) -> None:
397 """The main loop that measures the performance of dataloading and model inference.
398
399 Args:
400 loader: The dataloader to benchmark.
401 model: The model to benchmark.
402 """
403
404 _LG.info("Running inference.")
405 num_frames, num_correct_top1, num_correct_top5 = 0, 0, 0
406 t0 = time.monotonic()
407 try:
408 for i, (batch, labels) in enumerate(dataloader):
409 if i == 20:
410 t0 = time.monotonic()
411 num_frames, num_correct_top1, num_correct_top5 = 0, 0, 0
412
413 with (
414 torch.profiler.record_function(f"iter_{i}"),
415 spdl.utils.trace_event(f"iter_{i}"),
416 ):
417 top1, top5 = model(batch, labels)
418
419 num_frames += batch.shape[0]
420 num_correct_top1 += top1
421 num_correct_top5 += top5
422 finally:
423 elapsed = time.monotonic() - t0
424 if num_frames != 0:
425 num_correct_top1 = num_correct_top1.item()
426 num_correct_top5 = num_correct_top5.item()
427 fps = num_frames / elapsed
428 _LG.info(f"FPS={fps:.2f} ({num_frames}/{elapsed:.2f})")
429 acc1 = 0 if num_frames == 0 else num_correct_top1 / num_frames
430 _LG.info(f"Accuracy (top1)={acc1:.2%} ({num_correct_top1}/{num_frames})")
431 acc5 = 0 if num_frames == 0 else num_correct_top5 / num_frames
432 _LG.info(f"Accuracy (top5)={acc5:.2%} ({num_correct_top5}/{num_frames})")
433
434
435def _get_pipeline(args, device_index) -> Pipeline:
436 src = source(args.input_flist, args.prefix, args.max_samples)
437
438 if args.use_nvjpeg:
439 decode_func = _get_experimental_nvjpeg_decode_function(device_index)
440 concurrency = 7
441 elif args.use_nvdec:
442 decode_func = _get_experimental_nvdec_decode_function(device_index)
443 concurrency = 4
444 else:
445 decode_func = get_decode_func(device_index)
446 concurrency = args.num_threads
447
448 return get_pipeline(
449 src,
450 args.batch_size,
451 decode_func,
452 concurrency,
453 args.queue_size,
454 args.num_threads,
455 )
456
457
458def entrypoint(args: list[int] | None = None):
459 """CLI entrypoint. Run pipeline, transform and model and measure its performance."""
460
461 args = _parse_args(args)
462 _init_logging(args.debug)
463 _LG.info(args)
464
465 device_index = 0
466 model = get_model(args.batch_size, device_index, args.compile, args.use_bf16)
467 pipeline = _get_pipeline(args, device_index)
468
469 print(pipeline)
470
471 trace_path = f"{args.trace}"
472 if args.use_nvjpeg:
473 trace_path = f"{trace_path}.nvjpeg"
474 if args.use_nvdec:
475 trace_path = f"{trace_path}.nvdec"
476
477 with (
478 torch.no_grad(),
479 profile() if args.trace else contextlib.nullcontext() as prof,
480 spdl.utils.tracing(f"{trace_path}.pftrace", enable=args.trace is not None),
481 pipeline.auto_stop(timeout=1),
482 ):
483 benchmark(pipeline.get_iterator(), model)
484
485 if args.trace:
486 prof.export_chrome_trace(f"{trace_path}.json")
487
488
489def _init_logging(debug=False):
490 fmt = "%(asctime)s [%(filename)s:%(lineno)d] [%(levelname)s] %(message)s"
491 level = logging.DEBUG if debug else logging.INFO
492 logging.basicConfig(format=fmt, level=level)
493
494
495def get_mappings() -> dict[str, int]:
496 """Get the mapping from WordNet ID to class and label.
497
498 1000 IDs from ILSVRC2012 is used. The class indices are the index of
499 sorted WordNet ID, which corresponds to most models publicly available.
500
501 Returns:
502 Mapping from WordNet ID to class index.
503
504 Example:
505
506 .. code-block::
507
508 >>> class_mapping = get_mappings()
509 >>> print(class_mapping["n03709823"])
510 636
511
512 """
513 class_mapping = {}
514
515 path = os.path.join(os.path.dirname(__file__), "imagenet_class.tsv")
516 with open(path, mode="r", encoding="utf-8") as f:
517 for line in f:
518 if line := line.strip():
519 class_, wnid = line.split("\t")[:2]
520 class_mapping[wnid] = int(class_)
521 return class_mapping
522
523
524def parse_wnid(s: str):
525 """Parse a WordNet ID (nXXXXXXXX) from string.
526
527 Args:
528 s (str): String to parse
529
530 Returns:
531 (str): Wordnet ID if found otherwise an exception is raised.
532 If the string contain multiple WordNet IDs, the first one is returned.
533 """
534 if match := re.search(r"n\d{8}", s):
535 return match.group(0)
536 raise ValueError(f"The given string does not contain WNID: {s}")
537
538
539if __name__ == "__main__":
540 entrypoint()
Functions¶
Functions
- entrypoint(args: list[int] | None = None)[source]¶
CLI entrypoint. Run pipeline, transform and model and measure its performance.
- benchmark(dataloader: Iterator[tuple[Tensor, Tensor]], model: ModelBundle) None [source]¶
The main loop that measures the performance of dataloading and model inference.
- Parameters:
loader – The dataloader to benchmark.
model – The model to benchmark.
- source(path: Path, prefix: str = '', max_samples: int = inf) Iterator[tuple[str, int]] [source]¶
Iterate a file containing a list of paths.
- Parameters:
path – Path to the file containing list of file paths.
prefix – Prepended to the paths in the list.
max_samples – Maximum number of samples to yield.
- Yields:
The path of the image and its class label.
- get_decode_func(device_index: int, width: int = 224, height: int = 224) Callable[[list[tuple[str, int]]], Awaitable[tuple[Tensor, Tensor]]] [source]¶
Get a function to decode images from a list of paths.
- Parameters:
device_index – The index of the target GPU device.
width – The width of the decoded image.
height – The height of the decoded image.
- Returns:
Async function to decode images in to batch tensor of NCHW format and labels of shape
(batch_size, 1)
.
- get_pipeline(src: Iterator[tuple[str, int]], batch_size: int, decode_func: Callable[[list[tuple[str, int]]], Awaitable[tuple[Tensor, Tensor]]], concurrency: int, buffer_size: int, num_threads: int) Pipeline [source]¶
Build image data loading pipeline.
The pipeline uses the
decode_func
for decoding images concurrently and send the resulting data to GPU.- Parameters:
src – The source of the data. See
source()
.batch_size – The number of images in a batch.
- get_model(batch_size: int, device_index: int, compile: bool, use_bf16: bool, model_type: str = 'mobilenetv3_large_100') ModelBundle [source]¶
Build computation model, including transfor, model, and classification head.
- Parameters:
batch_size – The batch size of the input.
device_index – The index of the target GPU device.
compile – Whether to compile the model.
use_bf16 – Whether to use bfloat16 for the model.
model_type – The type of the model. Passed to
timm.create_model()
.
- Returns:
The resulting computation model.
- get_mappings() dict[str, int] [source]¶
Get the mapping from WordNet ID to class and label.
1000 IDs from ILSVRC2012 is used. The class indices are the index of sorted WordNet ID, which corresponds to most models publicly available.
- Returns:
Mapping from WordNet ID to class index.
Example
>>> class_mapping = get_mappings() >>> print(class_mapping["n03709823"]) 636
Classes¶
Classes
- class ModelBundle[source]¶
Bundle the transform, model backbone, and classification head into a single module for a simple handling.
- forward(images: Tensor, labels: Tensor) tuple[Tensor, Tensor] [source]¶
Given a batch of images and labels, compute the top1, top5 accuracy.
- Parameters:
images – A batch of images. The shape is
(batch_size, 3, 224, 224)
.labels – A batch of labels. The shape is
(batch_size,)
.
- Returns:
A tuple of top1 and top5 accuracy.
- class Classification[source]¶
- forward(x: Tensor, labels: Tensor) tuple[Tensor, Tensor] [source]¶
Given a batch of features and labels, compute the top1 and top5 accuracy.
- Parameters:
images – A batch of images. The shape is
(batch_size, 3, 224, 224)
.labels – A batch of labels. The shape is
(batch_size,)
.
- Returns:
A tuple of top1 and top5 accuracy.