9N235/9N210 Submunition Object Detector
- Building a 9N235/210 submunition object detector with photography, photogrammetry, 3D-rendering, 3D-printing, and convolutional neural networks
- Posted: Mar 1, 2023 . Modified: Mar 26, 2023
This page outlines the development process of building an object detector for the 9N235/210 submunition using photography, photogrammetry, 3D modeling, 3D printing, and convolutional neural networks. For code and models visit github.com/vframeio/vframe. The model is free to use for commercial purposes if the LICENSE and CREDIT information is included (MIT).
- February 2023: Improved detection models (version 1c) released
- October 2022: VFRAME partner Tech 4 Tracing dispatches policy brief (PDF) on using new technology for illicit arms control following our joint presentation at United Nations in summer 2022
Disclaimer: Never approach or handle any munition without explosive ordnance training or supervision from EOD personnel.
This page is still under development and comprises the preliminary version of a research report to be submitted in the next few months.
In spring of 2022 the VFRAME team partnered with Tech 4 Tracing on-site at an explosive ordinance training center in Europe with the goal of capturing photogrammetry scans of free-from-explosive (FFE) submunitions, including the 9N210 submunition (pictured above). Several hundred high-resolution photos were used to reconstruct a millimeter-accurate 3D model of the submunition’s geometry. With the high-fidelity 3D model as a reference, thousands of procedurally randomized photorealistic synthetic training images were generated, annotated, then used to train a convolutional neural network object detection algorithm.
The current 9N235/9N210 object detector model yields a 0.98 F1 score on a custom benchmark dataset with challenging examples including partially occluded, partially exploded, damaged, dirt-covered munitions in various weather conditions from various camera angles and lenses. The new model (version 1C) was released on February 1, 2023, is available for download with a MIT license at github.com/vframeio/vframe, and improves the overall performance of the previous model (version 1B) released in July last year.
The current version (1C) performs best on human-height videos or images created with smartphone camera that typically posted online to document scenes in conflict zones (i.e. OSINT sources) and is designed to handle typical artifacts common in online imagery including watermarks, compression, and light motion blur, and various image ratios. An additional version designed for aerial detection is planned for release later this year.
About VFRAME #
VFRAME is a computer vision project that develops open-source technology for human rights research, investigative journalists, and conflict zone monitoring. After several years of research and development into synthetic data fabrication techniques using 3D-rendering and 3D-printed data, this is the first publication of an object detection algorithm that uses all combined methods, as well as sufficient benchmark data to confirm the results.
Many thanks to the organizations that have supported this project during the last several years and to VFRAME’s latest partner Tech 4 Tracing for facilitating access to the FFE munitions, as well as Fenix Insight for additional support and coordination on benchmark data development, SIDA/Meedan for continued operation support, and PrototypeFund for initial research support into synthetic data.
9N235 Submunition #
The 9N210 and 9N235 are high-explosive fragmentation submunitions, also known as cluster munitions. Upon detonation the explosive payload blasts metal fragments in all directions, indiscriminately killing or maiming bystanders including non-combatant civilians. For this reason, cluster munitions are banned in 119 countries by the Convention on Cluster Munitions. Although neither Russian nor Ukraine are signatories (nor is the United States), they are still bound by international humanitarian law, which prohibits indiscriminate attacks.1
Recent documentation from Ukraine shows 9N235/9N210 submunitions have been widely deployed by Russia. Journalists and human rights researchers have documented dozens of instances where these cluster munitions were used in urban and residential areas. In December 2022 HRW reported that"[s]ince the full-scale invasion of Ukraine, Russian forces have repeatedly used cluster munitions, which are inherently indiscriminate weapons, in attacks that have killed hundreds of civilians and damaged homes, hospitals, and schools."2 Other previous reports show this weapon was also used in attacks in Syria. In March 2022, investigative group Bellingcat noted that the 9N235/9N210 cluster munitions were the “ most common type of cluster munition” so far used in Ukraine.
Based on the early observation by Bellingcat, the unique visual appearance of the 9N235/9N210, and the continued high frequency of reports by other research organizations, the 9N235/210 was selected by VFRAME as a preliminary candidate for object detection development. The next step was to analyze existing media reports and documentation to determine how well the object could be detected and whether it would be worth the development.
9N235/9N210 Detectability #
The 9N235 and 9N210 are nearly identical from their outside appearance. Each has the same 6 black fins, nose cone, body, dimensions, and approximately the same weight. The main difference is the internal configuration of the warhead. Otherwise, both use the same flawed fuze mechanism resulting in frequent unexploded ordnance (UXO). Even though the munitions contain a secondary self-destruction fuze with a 110 second delay, both have a high failure rate.3 As a result, UXO often remain after an attack, posing a fatal threat to civilians. Below are examples of the submunition appearing in recent documentation from Ukraine.
The combination of its high failure rate, frequent use, and distinctive appearance make the 9N235/9N210 a good candidate for possible automated detection. The only way to test this hypothesis will be to build the object detector and evaluate its performance.
There is a problem, however. A limited number of photos with the 9N235/9N210 submunition appear online. At the time of starting this project it was only a few dozen. Now there are more. But it’s still limited to a few hundred, of which many are near duplicates. After splitting that for train/test/val partitions it leaves nowhere enough data for training a robust object detector.
VFRAME is taking a new approach to building neural networks using art-driven, data-centric development. Instead of scraping biased images online or setting up sterile laboratory experiments, data is generated from the ground-truth up using an interdisciplinary combination of photography, photogrammetry, 3D-rendering, 3D-printing, custom software, and artistic replication. This post outlines the steps taken to build a high-performance 9N235/9N210 detector with almost no data from online sources, except for use in the final benchmarking dataset to evaluate the algorithm’s performance.
The first step will bypass the internet as a source of data and instead find access to the real submunition as the ground-truth source of data as a 3D model using photogrammetry.
Step 1: Photogrammetry is the process of using multiple high-resolution photos to reconstruct an object’s 3D geometry and surface texture, via the structure from motion (SfM) technique. Creating 3D scan models of physical objects has become increasingly simplified over the last decade, but there are many trade offs between different software, camera, and capture approaches. There are also dedicated handheld 3D scanners and smartphone devices that simplify the process further by integrating high-end depth sensors with on-device photogrammetry processing. There is no single best approach. For this project, the goals were high-accuracy, portability, and the ability to use utilize existing hardware (DSLR camera and GPU workstation).
The most important step is not the technology but finding safe access to a free-from-explosive munition. The munition should also be undamaged otherwise the damaged areas will become part of the ground-truth geometry. It should also be representative of the object as it appears in conflict zones, and not be significantly altered during the free-from-explosive conversion.
To access the 9N235/9N210 submunition, VFRAME partnered with the NGO Tech 4 Tracing; an international, non-profit partnership working to apply new technologies to arms and ammunition control. In the early spring of 2022, both teams traveled to a weapons training facility in Europe and carried out the photogrammetry capture.
In total about 200 high-resolution photos were used to create the 3D model. An automated turntable was used to expedite the process. Each camera position in the graphic below shows the camera position for each photo.
After post-processing the images and completing the 3d reconstruction process, the final result is a millimeter-accurate 3D model. This becomes an ideal ground truth for generating synthetic training data.
Synthetic Data: 3D Rendered #
Step 2: Synthetic data is an area that VFRAME has focused on heavily since 2019. It’s a game-changing technology for computer vision, especially for detecting rare and dangerous objects such as cluster munitions.
Typically, training images are gathered from online sources or from existing imaging systems such as CCTV. Images are then manually annotated in-studio, or outsourced to annotation click-workers in foreign countries. This creates multiple issues, among them data security, data bias, labor exploitation, cost, and the possibility of errors.
Synthetic data solves many of these problems because the annotations are automatically generated by software, diversity and bias can be controlled for, weather conditions can be programmed, and it can lower the overall cost. To develop the 9N235/9N210 synthetic training dataset, over 10,000 unique images were rendered using various lighting environments, scene compositions, dirt variations, damage variations, and camera lenses. Each factor can be deliberately controlled for.
The way the object appears in the synthetic image is based on observations from the preliminary research. It reflects how the submunition lands, material properties and weathering effects, and the terrain where its being documented. Often the submunition is lodged into a soft ground surface with all 6 black tail fins pointing upright. But sometimes the tail fins will break leaving a metal tube with anywhere from 0-6 fins. This can also be modeled for and is important to control the confidence levels for false positives.
The example image below was rendered using a 40mm lens, F5.6 aperture, afternoon lighting, centered on the 9N235 with all 6 fins intact. To improve diversity every image is procedurally randomized then manually reviewed to ensure the training data aligns with the expected outcomes.
Synthetic Data: 3D Printed #
Step 3: With enough work, 3D-rendered images can achieve convincing photorealism but they still contain artifacts of a simulated world and risk overfitting if the target objects are too rigid or lack diversity. Based on research carried out over the last several years, algorithms trained on synthetic data will always overfit and produce overconfident and misleading results if 3D-rendered synthetic images are used in the test dataset. To escape this curse of simulation, VFRAME has pioneered a hybrid approach that uses 3D-printed data to generate synthetic images in the “real-world”.
3D-printed data refers to the process of creating a 1:1 physical replica of an object using 3D-scanning, 3D-printing, and artistic replication. By recreating the digital surrogate object in the real world it escapes the limitations of 3D-rendered worlds and bridges the gap towards a more real reality. In other words, the 3D-printed replica can now be placed in a controlled staging environment to create scenes that would otherwise be too complex or costly to 3D model.
Another significant advantage of using 3D-printed data for submunitions is safety. Obtaining submunitions always involves risk, and removing the explosives material to make it FFE involves further risk for EOD personnel. The 3D-printed replicas are inert, hollow, plastic, and can even be made using environmentally responsible bioplastics like PLA.
The results are not perfect but can be convincingly real. Below are two photos of 9N235/9N210 submunitions. One is real and one is a replica. Both are covered in mud and photographed with the same camera in wet forest terrain.
Benchmark Data #
With the submunition 3D-modeled, synthetic images 3D-rendered, and 3D-printed models photographed, the next step is to curate the benchmark dataset to how well, or not, the neural network is able to detect the object.
Benchmark data is essential for understanding the accuracy of the trained object detector. An easy benchmark dataset yields unrealistic expectations for what the detector is capable of. To overcome bias in benchmark data, it’s helpful to spread this task across many seasons, terrain, contributors, and hardware. Images should contain easy, medium, and difficult scenarios. Not only is diversity useful for the model metrics, but it helps communicate to end users how well the detector can be expected to perform when, for example, a munition is partially exploded or broken. Or when it will trigger false-positives on similar looking objects. This is especially important for objects that pose a risk to life.
The results also help guide the thresholds settings for greedy or conservative deployments, where false positives rates are balanced with higher true positive recall rates. Because the output is always a probabilistic determination the actual deployment thresholds must be customized to the target environment. For example, a million-scale OSINT application could first triage everything above 90% accuracy, then look deeper at lower confidence (70-90%) matches when time permits. The more permissive threshold will usually locate more objects but at the expense of more false positives. In another example, an aerial survey of an attack site could start with a low-confidence threshold because the environment is more constrained and any object slightly resembling the target munition could be analyzed further by zooming in.
Model Metrics #
The model is trained using synthetic data but evaluated using multiple types of real data, including images sourced online. The most common metrics are applied to measure how well it can detect the true-positives (recall), how well it ignores the false-positives (accuracy), and how precise the bounding boxes are (mAP). These metrics are combined into one score called the F1 to give an summarized performance metric.
For this model the F1 score is 0.98 at 0.641 confidence. This means that if you set the confidence threshold in the processing software you should expect high-accuracy results, with only a few images missed. If you want to detect everything (recall=1.0) the confidence could be dropped to 0.0, but this would trigger more false positives and bring the accuracy down to 0.2 which might be acceptable in certain scenarios.
Another way to visulize performance is to look at the confusion matrix, which shows the true positive detections (2076) compared to the false positive (23) and false negative (63). The numbers here are highly dependent on the quality of the benchmark test images.
Test Images #
Step 6: Finally, the following images show examples from the test set. Use these as a reference to understand the confusion matrix. Consider that when the object is partially occluded by tall grass it can still be detected as along as most of the black fins are visible. Also, the test images are vertical, but most of the synthetic training images were horizontal. It’s important to test all aspect ratios.
When there’s motion blur, the detector should still work well. Here an older camera was used with a poor quality sensor that also produces overexposed areas on the metal. The detector performs well and is also able to ignore the false-positive decoys placed in the scene.
Another test determines if the detector is smart enough to differentiate between a complex scene of metal tubes and submunitions mixed together. Most of the targets are detected with over 90% accuracy, but the object in the top middle drops to 75% because the black fins are less prominent here and wet leaves are covering an important detail where the cylinder meets the fin assembly.
Referring back to the 3 images used as reference to evaluate the detectability, these are now used as benchmark data to evaluate the detector’s performance. None of these images were included in the training dataset. The results speak to the power of an artist-driven, data-centric approach to developing neural networks. All 3 objects were easily detected, even the submunition partially visible and still inside the rocket.
The model is trained in multiple architectures for deployment on workstations or mobile/edge devices. Running on a HEDT (high-end desktop workstation) achieves a maximum 187 FPS with the nano architecture and the full performance (recommended) model reaches 43 FPS.
Future Reports #
- OSINT analysis for 100K image dataset (April 2023)
- OSINT analysis for 1M video dataset (May 2023)
- Adam Harvey: AI/ML systems, synthetic data, object detection, 3D printing
- Josh Evans: photogrammetry, 3D reconstruction, 3d modeling, 3D printing
- Tech 4 Tracing: EOD coordination
- Fenix Insight: additional replica/surrogate fabrication and benchmark dataset collaboration
- The model files are released open-source with an MIT license. They are free to use for commercial systems only if the license is included and distributed with any software deployments.
- Unless otherwise noted, all images are © Adam Harvey / VFRAME.io
- Initial development self-supported by workshops and exhibition fees (2022-2023)
- Advancements and performance improvements supported by Fenix Insight (2023)
- Development of the 9N235/9N210 detector is largely based on several years of research supported by grants from Prototype Fund and SIDA/Meedan (2019-2021)
- Read more about VFRAME’s supporters here
Disclaimer: Never approach or handle any munition without explosive ordnance training or supervision from EOD personnel.
Explosive Ordnance Guide for Ukraine https://www.gichd.org/en/resources/publications/detail/publication/explosive-ordnance-guide-for-ukraine-second-edition/ ↩︎