Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (2024)

Muhammad Arbab ArshadDepartment of Computer Science, Iowa State University, Ames, USA.Talukder JuberyDepartment of Mechanical Engineering, Iowa State University, Ames, USA.James AffulDepartment of Mechanical Engineering, Iowa State University, Ames, USA.Anushrut JignasuDepartment of Mechanical Engineering, Iowa State University, Ames, USA.Aditya BaluDepartment of Mechanical Engineering, Iowa State University, Ames, USA.
Baskar Ganapathysubramanian
Department of Mechanical Engineering, Iowa State University, Ames, USA.
Soumik SarkarDepartment of Computer Science, Iowa State University, Ames, USA.Adarsh Krishnamurthy

Abstract

We evaluate different Neural Radiance Fields (NeRFs) techniques for the 3D reconstruction of plants in varied environments, from indoor settings to outdoor fields. Traditional methods usually fail to capture the complex geometric details of plants, which is crucial for phenotyping and breeding studies. We evaluate the reconstruction fidelity of NeRFs in three scenarios with increasing complexity and compare the results with the point cloud obtained using LiDAR as ground truth. In the most realistic field scenario, the NeRF models achieve a 74.6% F1 score after 30 minutes of training on the GPU, highlighting the efficacy of NeRFs for 3D reconstruction in challenging environments. Additionally, we propose an early stopping technique for NeRF training that almost halves the training time while achieving only a reduction of 7.4% in the average F1 score. This optimization process significantly enhances the speed and efficiency of 3D reconstruction using NeRFs. Our findings demonstrate the potential of NeRFs in detailed and realistic 3D plant reconstruction and suggest practical approaches for enhancing the speed and efficiency of NeRFs in the 3D reconstruction process.

Keywords: Neural Radiance Fields — 3D Reconstruction — Field Conditions

1Introduction

In recent years, reconstructing 3D geometry has emerged as a critical area within plant sciences. As global challenges in food production become increasingly complex[1], gaining a detailed understanding of plant structures has become essential. This goes beyond mere visual representation; capturing the intricate details of plant geometry provides valuable insights into their growth, responses to environmental stressors, and physiological processes[2, 3]. Consequently, there are several efforts for the 3D reconstruction of plants[4, 5, 6].

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (1)

One of the most common approaches for 3D reconstruction is photogrammetry, which relies on the analysis of discrete 2D pixels using techniques such as structure from motion (SfM)[7] and multi-view stereo (MVS)[8]. Another direct approach is utilizing LiDAR scanners (such as FARO 3D LiDAR scanner) to capture a dense 3D point cloud of the plants. This approach has been successfully used for the 3D reconstruction of Maize[9] and Tomato plants[10]. Contemporary 3D modeling techniques for plant structures face significant challenges when attempting to capture the minute details inherent in plants[2]. The complexity of plants, from their delicate leaf venation[11] to intricate branching patterns[12], necessitates models that encompass these specific details. Scans from multiple angles are essential to capture every detail, which is challenging since multiple LiDAR scans are time-consuming. Due to the limited poses, this approach does not scale well to capture minute details in large scenes; consequently, some desired details might be missed in the final model. Andújar etal. [13] have emphasized that, even with advanced sensors, there are gaps in detailed reconstruction. They also point out that while devices such as the MultiSense S7 from Carnegie Robotics combine lasers, depth cameras, and stereo vision to offer reasonable results, the high acquisition costs can be prohibitive. At the same time, while photogrammetry is adept at large-scale reconstructions, it often cannot capture subtle details of plants[14, 9, 10].

In addition to the challenges mentioned above, the dynamic nature of flexible objects such as plants and their environment introduces an added complexity. Plants, unlike static entities, undergo growth, exhibit movement in reaction to environmental stimuli such as wind, and demonstrate both diurnal and seasonal variations. The environmental dynamism, coupled with plant behavior, further complicates modeling efforts. Paturkar etal. [14] comprehensive investigation underscores that this dynamism inherently complicates the attainment of precise 3D models. Factors such as persistent growth, environmental dynamism, and external perturbations, notably in windy scenarios, jeopardize the consistency of data acquisition during imaging processes[15, 16]. Liénard etal. [17] highlight that errors in post-processing UAV-based 3D reconstructions can lead to severe, irreversible consequences. This complexity necessitates innovative solutions in 3D modeling and data processing.

One of the most recent approaches for 3D reconstruction is Neural Radiance Fields (NeRFs). At its core, NeRFs utilize deep learning to synthesize continuous 3D scenes by modeling the complete volumetric radiance field[18]. NeRFs enable the rendering of photorealistic scenes from any viewpoint from a neural network trained using a set of 2D images without necessitating explicit 3D geometry or depth maps. NeRFs use implicit representations of the volumetric scene, in contrast to explicit representations such as point clouds in SfM and voxel grids in MVS. The implicit representation utilized by NeRF is resolution invariant, allowing for more detailed and granular modeling without the constraints of resolution-dependent methods. The versatility and rapid adoption of NeRF as a state-of-the-art technique in computer vision and graphics underscore its significance, with applications ranging from virtual reality[19] to architectural reconstructions[20]. Particularly in plant science research, NeRF’s ability to capture fine details offers the potential for deep insights into plant structures and has the potential to be a vital tool in plant phenotyping and breeding (see Figure1).

These factors indicate that the challenges in capturing detailed plant structures remain, even when employing sophisticated sensors. Financial implications further exacerbate these challenges. Traditional 3D modeling techniques often fall short of accurately capturing the complex 3D structures of plants[21]. Although direct techniques such as LiDAR scanners provide better accuracy, their exorbitant costs often render them inaccessible to many researchers. Tang etal. [22] delineate that the financial commitment associated with such advanced equipment, combined with the specialized expertise requisite for its operation, limits their adoption within academic and enthusiast domains.

In this paper, we perform a detailed evaluation of NeRF methodologies to assess their applicability and effectiveness for high-resolution 3D reconstruction of plant structures. An essential part of our study involves a comparative analysis of different NeRF implementations to determine the most effective framework for specific plant modeling needs. This includes assessing the methods’ fidelity, computational efficiency, and ability to adapt to changes in environmental conditions. Such comparative analysis is crucial for establishing benchmarks for NeRF’s current capabilities and identifying future technological improvement opportunities. Building on this foundation, we introduce an early-stopping algorithm to preemptively terminate the training process, significantly reducing computational cost while retaining model precision. We summarize our contributions as follows:

  1. 1.

    A dataset collection encompassing a wide range of plant scenarios for reconstruction purposes consisting of images, camera poses, and ground truth TLS scans.

  2. 2.

    An evaluation of state-of-the-art NeRF techniques across different 2D and 3D metrics, offering insights for further research.

  3. 3.

    An early stopping algorithm to efficiently halt the NeRF training when improvements in model fidelity no longer justify computational costs, ensuring optimal resource use.

  4. 4.

    The development of an end-to-end 3D reconstruction framework using NeRFs designed specifically for the 3D reconstruction of plants.

Our research aims to explore the feasibility of NeRFs for the 3D reconstruction of plants offering an in-depth analysis. A pivotal aspect of our methodology is using low-cost mobile cameras for data acquisition. By utilizing the widespread availability and imaging capabilities of modern smartphones, we can make high-quality image data collection more accessible and cost-effective. This approach, combined with the NeRFs’ ability to process various image datasets for 3D reconstruction, can revolutionize plant reconstruction efforts.

The rest of the paper is arranged as follows. In Section2, we outline the dataset collection, NeRF implementations, evaluation methods, and the LPIPS-based early-stopping algorithm. In Section3, we analyze results from single and multiple plant scenarios, both indoors and outdoors, using critical performance metrics. In Section4, we provide a theoretical discussion on the sampling strategies of different NeRF implementations and examine their impact on performance. We finally conclude in Section5.

2Materials and Methods

To evaluate 3D plant reconstruction using NeRFs, we propose a comprehensive methodology encompassing data collection, NeRF implementations, evaluation metrics, and an early stopping algorithm. The overall workflow of the different steps of our framework is shown in Figure2.

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (2)

Reconstruction   Evaluation

2.1Evaluation Scenarios and Data Collection

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (3)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (4)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (5)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (6)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (7)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (8)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (9)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (10)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (11)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (12)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (13)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (14)

We evaluate NeRFs, examining three distinct scenarios with ground truth data, from controlled indoor to dynamic outdoor environments, and a final testing scenario. The four scenarios are:

  1. 1.

    Single Corn Plant Indoor: This serves as the simplest test case. A solitary corn plant is placed in a controlled indoor environment. The lighting, background, and other environmental factors are kept constant. The objective is to assess the basic capabilities of NeRF in reconstructing an individual plant structure[23] (see Figure3(a)).

  2. 2.

    Multiple Corn Plants Indoor: In this case, more than one corn plant is situated in an indoor setting. The increased complexity due to multiple plants poses a greater challenge for the 3D reconstruction. Inter-plant occlusions and varying plant orientations add an additional layer of complexity (see Figure3(b)).

  3. 3.

    Multiple Corn Plants in a Field with Other Plants: This scenario represents a real-world agricultural field, where corn plants are interspersed with other types of plants. The added complexity due to variable lighting, wind, and other dynamic environmental conditions tests the robustness of the NeRF technology (see Figure3(c)). We selected a row plot of corn plants planted at approximately 0.2 m distance, approximately at the V12 stage. The leaves between two neighboring plants are overlapping.

  4. 4.

    In-field Test Data: For validating the proposed early stopping methodology, a diverse dataset was assembled, featuring scenarios with Soybean, Anthurium Hookeri, a mixture of plants, Cymbidium Floribundum, and Hydrangea Paniculata(see Figure8).

Our training dataset for NeRF is sourced from RGB images and LiDAR data captured using a mobile phone, with the RGB images aiding in the 3D reconstruction of the plants and the LiDAR exclusively for pose capture. For all three scenarios, data is captured using an iPhone 13 Pro featuring 4K resolution. The device is held at a constant height while circling the plant to ensure consistent capture angles. The data collection process utilizes the Polycam app[24], with approximately 2.5 minutes for scenario 3 (multiple plants in the outdoor setting) and around 1 minute for scenario 1 (single plant in the indoor setting). To establish accurate ground truth, we utilized high-definition terrestrial LiDAR scans using the Faro® Focus S350 Scanner. The scanner has an angular resolution of 0.011 degrees, equating to a 1.5 mm point spacing over a 10 m scanning range, and the capacity to acquire point clouds of up to 700 million points (MP) at 1 million points per second. Additionally, the scanner includes a built-in RGB camera that captures 360-degree images once the scanning process is complete.

Both in indoor and outdoor settings, we scan the plants from four (for the single plant) to six (for multiple plants) locations around the plant(s) at a height of 1.5 m and a distance of 1.5 m from the plant(s). To reduce the movement of the leaves during scanning, in indoor settings, we ensure that there is no airflow around the plants, and in outdoor settings, we waited for a suitable time when there was negligible wind flow (August 31, 2023, at 8:30 a.m.). Each scan required approximately 2.5 minutes, totaling a capture time of around 18 minutes in outdoor settings, including manually moving the scanner around the plot. The six scans were processed in SCENE® software to add RGB color data to the point clouds, followed by the registration of the clouds by minimizing cloud-to-cloud distance and top view distance. Afterward, we cropped out the area of interest from the registered point cloud, removed duplicate points, and reduced noise using statistical outlier removal based on global and local point-to-point distance distributions. This process resulted in the point cloud having an average resolution of about 7 mm. This experimental setup enables the NeRF algorithm to work on a range of complexities, from controlled environments to dynamic, real-world conditions.

Camera pose estimation is a crucial second step, typically achieved through a Structure from Motion (SfM) pipeline such as COLMAP[25]. This process is essential for obtaining accurate 3D structures from sequences of images by determining correspondences between feature points and by using sequential matching, especially effective since our dataset comprises video frames.

2.2Neural Radiance Fields (NeRFs)

Neural Radiance Fields (NeRFs) model a scene as a continuous function mapping a 3D position 𝐱=(x,y,z)𝐱𝑥𝑦𝑧\mathbf{x}=(x,y,z)bold_x = ( italic_x , italic_y , italic_z ) and a 2D viewing direction 𝐝=(θ,ϕ)𝐝𝜃italic-ϕ\mathbf{d}=(\theta,\phi)bold_d = ( italic_θ , italic_ϕ ) to a color 𝐜=(r,g,b)𝐜𝑟𝑔𝑏\mathbf{c}=(r,g,b)bold_c = ( italic_r , italic_g , italic_b ) and density σ𝜎\sigmaitalic_σ. The function is parameterized by a neural network Fθsubscript𝐹𝜃F_{\theta}italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, expressed as:

(𝐜,σ)=Fθ(𝐱,𝐝)𝐜𝜎subscript𝐹𝜃𝐱𝐝(\mathbf{c},\sigma)=F_{\theta}(\mathbf{x},\mathbf{d})( bold_c , italic_σ ) = italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x , bold_d )(1)

Rendering an image involves integrating the color and density along camera rays, a process formalized as:

𝐂(𝐫)=tntfT(t)σ(𝐫(t))𝐜(𝐫(t),𝐝)𝑑t𝐂𝐫superscriptsubscriptsubscript𝑡𝑛subscript𝑡𝑓𝑇𝑡𝜎𝐫𝑡𝐜𝐫𝑡𝐝differential-d𝑡\mathbf{C}(\mathbf{r})=\int_{t_{n}}^{t_{f}}T(t)\sigma(\mathbf{r}(t))\mathbf{c}%(\mathbf{r}(t),\mathbf{d})dtbold_C ( bold_r ) = ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_T ( italic_t ) italic_σ ( bold_r ( italic_t ) ) bold_c ( bold_r ( italic_t ) , bold_d ) italic_d italic_t(2)

where T(t)=exp(tntσ(𝐫(s))𝑑s)𝑇𝑡superscriptsubscriptsubscript𝑡𝑛𝑡𝜎𝐫𝑠differential-d𝑠T(t)=\exp\left(-\int_{t_{n}}^{t}\sigma(\mathbf{r}(s))ds\right)italic_T ( italic_t ) = roman_exp ( - ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_σ ( bold_r ( italic_s ) ) italic_d italic_s ) represents the accumulated transmittance along the ray 𝐫(t)=𝐨+t𝐝𝐫𝑡𝐨𝑡𝐝\mathbf{r}(t)=\mathbf{o}+t\mathbf{d}bold_r ( italic_t ) = bold_o + italic_t bold_d, with 𝐨𝐨\mathbf{o}bold_o being the ray origin and [tn,tf]subscript𝑡𝑛subscript𝑡𝑓[t_{n},t_{f}][ italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ] the near and far bounds. In our workflow, we incorporate some of the state-of-the-art NeRF implementations optimized for their 3D reconstruction capabilities, which are critical to enable large-scale plant phenotyping studies. Specifically, we employ Instant-NGP[26], TensoRF[27], and NeRFacto[28].

We specifically chose Instant-NGP, TensoRF, and NeRFacto to evaluate for plant reconstruction since these implementations are more efficient and achieve comparable results as a vanilla NeRF approximately 50 times faster. Each of these implementations introduces several new features over the vanilla NeRF implementations. Instant-NGP introduces a small neural network complemented by a multiresolution hash table, optimizing the number of operations required for training and rendering[26]. TensoRF, on the other hand, conceptualizes the radiance field as a 4D tensor and applies tensor decomposition to achieve better rendering quality and faster reconstruction times compared to the traditional NeRF approach[27]. NeRFacto combines various techniques such as the Multilayer Perceptron (MLP) adapted from Instant-NGP, and the Proposal Network Sampler from MipNeRF-360[29]. Apart from these three methods, we also tried the vanilla Mip-NeRF[30]. Unfortunately, Mip-NeRF fails to reconstruct more complicated 3D scenes (such as Scenario II) in our testing. Please refer to the Supplementary section where we provide a table for training (over time) of MipNeRF. We briefly describe the three tested NeRF approaches below.

Instant-NGP:Instant-NGP introduces advancements in NeRFs by focusing on three key improvements: enhanced sampling through occupancy grids, a streamlined neural network architecture, and a multi-resolution hash encoding technique. The hallmark of Instant-NGP is its multi-resolution hash encoding. This approach maps input coordinates to trainable feature vectors stored across multiple resolutions. For each input coordinate, the method hashes surrounding voxel vertices, retrieves and interpolates the corresponding feature vectors, and then inputs these interpolated vectors into the neural network. This process enhances the model’s ability to learn complex geometries and ensures a smoother function due to the trainable nature of the feature vectors. The overall design of Instant-NGP drastically accelerates NeRF training and rendering, enabling near real-time processing capabilities. These enhancements collectively empower Instant-NGP to achieve speedups of up to 1000×\times×. The method also employs multiscale occupancy grids to efficiently bypass empty space and areas beyond dense media during sampling, thereby reducing the computational load. These occupancy grids are dynamically updated based on the evolving understanding of the scene’s geometry, facilitating an increase in sampling efficiency. In parallel, Instant-NGP adopts a compact, fully-fused neural network architecture designed for rapid execution. This network is optimized to operate within a single CUDA kernel, consisting of only four layers with 64 neurons each, resulting in a speed boost—achieving a 5-10 times faster performance than traditional NeRF implementations.

TensoRF:TensoRF improves scene representation by modeling the radiance field as a 4D tensor within a 3D voxel grid, where each voxel is enriched with multi-channel features. This model leverages tensor decomposition to efficiently manage the high-dimensional data, utilizing two key techniques: Canonic Polyadic (CP) and Vector-Matrix (VM) decompositions. CP decomposition simplifies the tensor into rank-one components using compact vectors, reducing the model’s memory footprint. VM decomposition, alternatively, breaks the tensor into compact vector and matrix factors, striking a balance between memory efficiency and detail capture. These enable TensoRF to reduce memory requirements while enhancing rendering quality and accelerating reconstruction times. CP decomposition leads to faster scene reconstruction with improved rendering quality and a smaller model size compared to conventional NeRF approaches. VM decomposition takes this further, offering even better rendering quality and quicker reconstruction, all within a compact model size.

NeRFacto:NeRFacto is an aggregate of techniques optimized for rendering static scenes from real images. The model enhances the NeRF framework by incorporating pose refinement and advanced sampling strategies to improve the fidelity of the scene reconstruction. Pose refinement is critical when initial camera poses are imprecise, which is often the case with mobile capture technologies. NeRFacto refines these poses, thus mitigating artifacts and enhancing detail. The model employs a Piecewise sampler for initial scene sampling, allocating samples to optimize the coverage of both near and distant objects. This is further refined using a Proposal sampler, which focuses on areas that contribute most to the scene’s appearance and is informed by a density function derived from a small, fused MLP with hash encoding. Such a design ensures efficient sampling and better reconstruction. Further explanation and contrast with Instant-NGP is given in the discussion section. The implementations for aforementioned algorithms are taken from the open source project NeRFStudio[28].

PaperInstant-NGPNeRFactoTensoRFNeRFAdditional Methods
Azzarelli etal. [31]×\times××\times×Mip-NeRF
Radl etal. [32]×\times××\times×Mip-NeRF
Li etal. [33]×\times××\times×NSVF, PlenOctree, KiloNeRF, DIVeR
Remondino etal. [34]×\times×MonoSDF, VolSDF, NeuS, UniSurf
Balloni etal. [35]×\times××\times××\times×-
Ours×\times×-

There have been several recent works that have compared NeRF approaches for 3D reconstruction. Table1 summarizes some recent work evaluating different NeRF methodologies. Some of these recent research works also employ additional methods to improve reconstruction fidelity. For example SteerNeRF[33] utilizes neural sparse voxel fields (NSVF)[36], KiloNeRF[37], PlenOctree[38], and DIVeR[39], to obtain a smooth rendering from different viewpoints. NSVF introduces a fast, high-quality, viewpoint-free rendering method using a sparse voxel octree for efficient scene representation. KiloNeRF accelerates NeRF’s rendering by three orders of magnitude using thousands of tiny MLPs, maintaining visual quality with efficient training. PlenOctree uses an Octree data structure to store the Plenoptic function. DIVeR improves upon NeRF by using deterministic estimates for volume rendering, allowing for realistic 3D rendering from few images. Similar to our work, Azzarelli etal. [31] propose a framework for evaluating NeRF methods using Instant-NGP, NeRFacto, and Mip-NeRF, focusing on neural rendering isolation and parametric evaluation. Radl etal. [32] analyze trained vanilla NeRFs, Instant-NGP, NeRFActo, and Mip-NeRF, showing accelerated computations by transforming activation features, reducing computations by 50%.

Remondino etal. [34] analyze image-based 3D reconstruction comparing different NeRFs (including Instant-NGP, NeRFacto, TensoRF, MonoSDF[40], VolSDF[41], NeUS[42], UniSurf[43]) with traditional photogrammetry, highlighting their applicability and performance differences for reconstructing heritage scenes and monuments. Balloni etal. [35] does the same but with using only Instant-NGP. Each of these different NeRF implementations have some advancements over vanilla NeRF. MonoSDF demonstrates that incorporating monocular geometry cues improves the quality of neural implicit surface reconstruction. VolSDF improves the volume rendering of signed distance fields (SDF) using a new density representation. NeuS introduces a bias-free volume rendering method for neural surface reconstruction, outperforming existing techniques in handling complex structures and self-occlusions. UniSurf combines implicit surface models and radiance fields, enhancing 3D reconstruction and novel view synthesis without input masks.

2.33D Registration

We reconstruct the scene and capture point clouds using a FARO scan for ground truth. 3D registration or alignment is crucial to perform a one-to-one comparison between the NeRF-based reconstruction and ground truth. Our alignment and evaluation methodology is adapted fromKnapitsch etal. [44]. In their work, they evaluate different pipelines and use COLMAP as a ’arbitrary reference’ frame. However, in our case, all the NeRFs use COLMAP in their pipeline, so the reference and reconstruction frames become the same. The steps used for registration are:

  • Preliminary Camera Trajectory Alignment: The NeRF-reconstructed point cloud is manually aligned with the ground truth using point-based alignment. Four corresponding points are selected in both point clouds to compute an initial transformation matrix. This matrix aligns the camera poses, providing initial scale and orientation estimates. This initial coarse-grained alignment step paves the way for more detailed alignment procedures.

  • Cropping: Each ground-truth model has a manually-defined bounding volume, outlining the evaluation region for reconstruction.

  • Iterative Closest Point (ICP) Registration: Drawing inspiration from the iterative refinement process detailed by Besl and McKay [45] and further refined by Zhang [46], we adopt a three-stage approach[44] for our initial registration framework. The process begins with a specified voxel size and an associated threshold for the initial registration. In the next iteration, the transformation result from the previous step is used as a starting point, with the voxel size reduced by half to achieve finer detail in the registration. The third stage aims to refine the alignment further by returning to the original voxel size and adjusting the threshold to facilitate convergence at each stage. This multi-scale strategy is designed to capture both coarse and fine details, thereby improving the accuracy and precision of the model alignment. However, in our adaptation for plant structure reconstruction, we diverged from Knapitsch etal. [44] by maintaining the iterative process within a single stage rather than expanding across multiple stages. We found that increasing the iteration count tenfold, rather than the number of stages, prevented the registration process from collapsing[47].

2.4Evaluation Metrics

To assess the similarity between the ground truth (obtained from TLS) and the reconstructed 3D pointcloud, the following metrics are employed:

  1. 1.

    Precision/Accuracy. Given a reconstructed point set \mathcal{R}caligraphic_R and a ground truth set 𝒢𝒢\mathcal{G}caligraphic_G, the precision metric P(d)𝑃𝑑P(d)italic_P ( italic_d ) assesses the proximity of points in \mathcal{R}caligraphic_R to 𝒢𝒢\mathcal{G}caligraphic_G within a distance threshold d𝑑ditalic_d. Mathematically, it is formulated as:

    P(d)=100||𝐫𝕀(min𝐠𝒢𝐫𝐠<d),𝑃𝑑100subscript𝐫𝕀subscript𝐠𝒢norm𝐫𝐠𝑑P(d)=\frac{100}{|\mathcal{R}|}\sum_{\mathbf{r}\in\mathcal{R}}\mathbb{I}\left(%\min_{\mathbf{g}\in\mathcal{G}}\|\mathbf{r}-\mathbf{g}\|<d\right),italic_P ( italic_d ) = divide start_ARG 100 end_ARG start_ARG | caligraphic_R | end_ARG ∑ start_POSTSUBSCRIPT bold_r ∈ caligraphic_R end_POSTSUBSCRIPT blackboard_I ( roman_min start_POSTSUBSCRIPT bold_g ∈ caligraphic_G end_POSTSUBSCRIPT ∥ bold_r - bold_g ∥ < italic_d ) ,(3)

    where 𝕀()𝕀\mathbb{I}(\cdot)blackboard_I ( ⋅ ) is an indicator function. Precision values ranges from 0 to 100, with higher values indicating better performance.

  2. 2.

    Recall/Completeness. Conversely, the recall metric R(d)𝑅𝑑R(d)italic_R ( italic_d ) quantifies how well the reconstruction \mathcal{R}caligraphic_R encompasses the points in the ground truth 𝒢𝒢\mathcal{G}caligraphic_G for a given distance threshold d𝑑ditalic_d. It is defined as:

    R(d)=100|𝒢|𝐠𝒢𝕀(min𝐫𝐠𝐫<d).𝑅𝑑100𝒢subscript𝐠𝒢𝕀subscript𝐫norm𝐠𝐫𝑑R(d)=\frac{100}{|\mathcal{G}|}\sum_{\mathbf{g}\in\mathcal{G}}\mathbb{I}\left(%\min_{\mathbf{r}\in\mathcal{R}}\|\mathbf{g}-\mathbf{r}\|<d\right).italic_R ( italic_d ) = divide start_ARG 100 end_ARG start_ARG | caligraphic_G | end_ARG ∑ start_POSTSUBSCRIPT bold_g ∈ caligraphic_G end_POSTSUBSCRIPT blackboard_I ( roman_min start_POSTSUBSCRIPT bold_r ∈ caligraphic_R end_POSTSUBSCRIPT ∥ bold_g - bold_r ∥ < italic_d ) .(4)

    Its value ranges from 0 to 100, with higher values indicating better performance. Both the above two metrics are extensively utilized in recent studies.[35, 48].

  3. 3.

    F-score. The F-score, denoted as F(d)𝐹𝑑F(d)italic_F ( italic_d ), serves as a harmonic summary measure that encapsulates both the precision P(d)𝑃𝑑P(d)italic_P ( italic_d ) and recall R(d)𝑅𝑑R(d)italic_R ( italic_d ) for a given distance threshold d𝑑ditalic_d. It is specifically designed to penalize extreme imbalances between P(d)𝑃𝑑P(d)italic_P ( italic_d ) and R(d)𝑅𝑑R(d)italic_R ( italic_d ). Mathematically, it can be expressed as:

    F(d)=2×P(d)×R(d)P(d)+R(d).𝐹𝑑2𝑃𝑑𝑅𝑑𝑃𝑑𝑅𝑑F(d)=\frac{2\times P(d)\times R(d)}{P(d)+R(d)}.italic_F ( italic_d ) = divide start_ARG 2 × italic_P ( italic_d ) × italic_R ( italic_d ) end_ARG start_ARG italic_P ( italic_d ) + italic_R ( italic_d ) end_ARG .(5)

    The harmonic nature of the F-score ensures that if either P(d)𝑃𝑑P(d)italic_P ( italic_d ) or R(d)𝑅𝑑R(d)italic_R ( italic_d ) approaches zero, the F-score will also tend towards zero, providing a more robust summary statistic than the arithmetic mean. F-score ranges from 0 to 100, with higher values indicating better performance. The details about value of d𝑑ditalic_d cutoff is given later in discussion about precision-recall curves.

    For quantifying the quality of the NeRF-rendered 2D image compared to the validation image (left out from NeRF training), the following metrics are used:

  4. 4.

    Learned Perceptual Image Patch Similarity (LPIPS)[49]:To quantify the perceptual differences between two image patches, x𝑥xitalic_x and x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the Learned Perceptual Image Patch Similarity (LPIPS) framework employs activations from a neural network F𝐹Fitalic_F. Features are extracted from L𝐿Litalic_L layers and normalized across the channel dimension. For each layer l𝑙litalic_l, the normalized features are represented by y^lsuperscript^𝑦𝑙\hat{y}^{l}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and y^l0superscript^𝑦𝑙0\hat{y}^{l0}over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT italic_l 0 end_POSTSUPERSCRIPT, which exist in the space Hl×Wl×Clsuperscriptsubscript𝐻𝑙subscript𝑊𝑙subscript𝐶𝑙\mathbb{R}^{H_{l}\times W_{l}\times C_{l}}blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. These are then weighted channel-wise by a vector wlClsubscript𝑤𝑙superscriptsubscript𝐶𝑙w_{l}\in\mathbb{R}^{C_{l}}italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. The perceptual distance is computed using the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm, both spatially and across channels, as expressed in the equation:

    d(x,x0)=l1HlWlh,wwl(y^hwly^hwl0)22𝑑𝑥subscript𝑥0subscript𝑙1subscript𝐻𝑙subscript𝑊𝑙subscript𝑤superscriptsubscriptnormdirect-productsubscript𝑤𝑙subscriptsuperscript^𝑦𝑙𝑤subscriptsuperscript^𝑦𝑙0𝑤22d(x,x_{0})=\sum_{l}\frac{1}{H_{l}W_{l}}\sum_{h,w}\left\|w_{l}\odot(\hat{y}^{l}%_{hw}-\hat{y}^{l0}_{hw})\right\|_{2}^{2}italic_d ( italic_x , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_h , italic_w end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⊙ ( over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_w end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT italic_l 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_h italic_w end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT(6)

    This distance metric, d(x,x0)𝑑𝑥subscript𝑥0d(x,x_{0})italic_d ( italic_x , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), provides a scalar value indicating the perceptual dissimilarity between the patches. The vector wlsubscript𝑤𝑙w_{l}italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT weights the contribution of each channel to the distance metric. By setting wlsubscript𝑤𝑙w_{l}italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT to 1/Cl1subscript𝐶𝑙1/\sqrt{C_{l}}1 / square-root start_ARG italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG, the computation effectively measures the cosine distance, highlighting the directional alignment of the feature vectors instead of their magnitude. Its value ranges from 0 to 1, with lower values indicating better performance.

  5. 5.

    Peak Signal-to-Noise Ratio (PSNR):[50] The PSNR between two images, one being the reference and the other the reconstructed image, is defined as:

    PSNR=10log10(MAXI2MSE),PSNR10subscript10superscriptsubscriptMAX𝐼2MSE\text{PSNR}=10\cdot\log_{10}\left(\frac{\text{MAX}_{I}^{2}}{\text{MSE}}\right),PSNR = 10 ⋅ roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( divide start_ARG MAX start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG MSE end_ARG ) ,(7)

    where MAXIsubscriptMAX𝐼\text{MAX}_{I}MAX start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is the maximum possible pixel value of the image, and MSE is the Mean Squared Error between the reference and the reconstructed image. The MSE is given by:

    MSE=1mni=1mj=1n(I(i,j)K(i,j))2,MSE1𝑚𝑛superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝑛superscript𝐼𝑖𝑗𝐾𝑖𝑗2\text{MSE}=\frac{1}{mn}\sum_{i=1}^{m}\sum_{j=1}^{n}\left(I(i,j)-K(i,j)\right)^%{2},MSE = divide start_ARG 1 end_ARG start_ARG italic_m italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_I ( italic_i , italic_j ) - italic_K ( italic_i , italic_j ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,(8)

    where I𝐼Iitalic_I is the reference image, K𝐾Kitalic_K is the reconstructed image, and m𝑚mitalic_m and n𝑛nitalic_n are the dimensions of the images. A higher value of PSNR indicate better performance.

  6. 6.

    Structural Similarity Index (SSIM):[51] The SSIM index is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is designed to improve on traditional methods like PSNR and MSE, which have proven to be inconsistent with human eye perception. The SSIM index between two images x𝑥xitalic_x and y𝑦yitalic_y is defined as:

    SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2),SSIM𝑥𝑦2subscript𝜇𝑥subscript𝜇𝑦subscript𝐶12subscript𝜎𝑥𝑦subscript𝐶2superscriptsubscript𝜇𝑥2superscriptsubscript𝜇𝑦2subscript𝐶1superscriptsubscript𝜎𝑥2superscriptsubscript𝜎𝑦2subscript𝐶2\text{SSIM}(x,y)=\frac{(2\mu_{x}\mu_{y}+C_{1})(2\sigma_{xy}+C_{2})}{(\mu_{x}^{%2}+\mu_{y}^{2}+C_{1})(\sigma_{x}^{2}+\sigma_{y}^{2}+C_{2})},SSIM ( italic_x , italic_y ) = divide start_ARG ( 2 italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( 2 italic_σ start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ,(9)

    where μxsubscript𝜇𝑥\mu_{x}italic_μ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is the average of x𝑥xitalic_x, μysubscript𝜇𝑦\mu_{y}italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is the average of y𝑦yitalic_y, σx2superscriptsubscript𝜎𝑥2\sigma_{x}^{2}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the variance of x𝑥xitalic_x, σy2superscriptsubscript𝜎𝑦2\sigma_{y}^{2}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the variance of y𝑦yitalic_y, σxysubscript𝜎𝑥𝑦\sigma_{xy}italic_σ start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT is the covariance of x𝑥xitalic_x and y𝑦yitalic_y, and C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are constants to stabilize the division with weak denominator. These last three metrics do not need the 3D ground truth and are widely used in literature[52, 53] for evaluation. SSIM ranges from -1 to 1, with higher values indicating better performance.

Precision-Recall curves:

Precision-recall curves are utilized to methodically evaluate how distance threshold d𝑑ditalic_d changes influence precision P(d)𝑃𝑑P(d)italic_P ( italic_d ) and recall R(d)𝑅𝑑R(d)italic_R ( italic_d ) metrics, demonstrating the trade-off between these measurements under varying threshold conditions. To set the value of d𝑑ditalic_d for the final assessment, we opt for a conservative estimate before the plateauing of precision-recall curves. For indoor scenarios, assuming a hypothetical grid size of 128x128x128 for reference, we establish d𝑑ditalic_d at 0.005. In this scenario, the voxel size is calculated as 1/1280.007812511280.00781251/128\approx 0.00781251 / 128 ≈ 0.0078125, which makes the threshold of 0.005 smaller than the voxel size. This indicates a requirement for points to be closer than the dimensions of a single voxel to be identified as distinct, highlighting a prioritization of detail sensitivity within a hypothetically coarser grid. Such a setting is especially pertinent for capturing the complex geometries of indoor plants, where precision in detail is crucial. Due to the size and complexity of the scene, a threshold of 0.01 is selected for outdoor plant reconstructions.

2.5Early Stopping of NeRF Training using LPIPS

In training NeRFs for plant scene reconstruction, the F1 score is essential for validating the accuracy of the reconstructed point cloud against the ground truth. The inherent challenge during the training phase of NeRFs is the absence of ground truth, paradoxically the output we aim to correspond. Moreover, the training process for NeRFs is notoriously compute-intensive. The cumulative costs become challenging when scaled to multiple scenes or across extensive agricultural fields.

Figure4 shows the scatter plots of PSNR, SSIM, and LPIPS scores against the F1 score, alongside their respective Pearson correlation coefficients. This visualization offers an immediate visual assessment of the relationships between these metrics, and allows for a nuanced understanding of how accurately each metric predicts the true F1 score. The exceptionally strong negative correlation between LPIPS and F1 score (-0.82) reinforces the notion that LPIPS effectively captures the perceptual similarity between the reconstructed and ground truth point clouds, making it a reliable proxy for F1 score, the ultimate measure of reconstruction fidelity.

F1 Score

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (15)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (16)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (17)

Scenario I   Scenario II   Scenario III

1:procedureDetectPlateau(𝒮,𝒢,θ,C𝒮𝒢𝜃𝐶\mathcal{S},\mathcal{G},\theta,Ccaligraphic_S , caligraphic_G , italic_θ , italic_C) \triangleright Inputs: Sets of images per iteration 𝒮𝒮\mathcal{S}caligraphic_S, Sets of GT images 𝒢𝒢\mathcal{G}caligraphic_G, threshold θ𝜃\thetaitalic_θ, consistency length C𝐶Citalic_C

2:Initialize an empty list \mathcal{M}caligraphic_M to store average LPIPS values over training iterations.

3:foreach set \mathcal{I}caligraphic_I and corresponding GT set 𝒢Isubscript𝒢𝐼\mathcal{G}_{I}caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT in 𝒮𝒮\mathcal{S}caligraphic_S and 𝒢𝒢\mathcal{G}caligraphic_Gdo

4:Initialize sumLPIPS=0sumLPIPS0\text{sumLPIPS}=0sumLPIPS = 0.

5:foreach image I𝐼Iitalic_I and corresponding GT image G𝐺Gitalic_G in \mathcal{I}caligraphic_I and 𝒢Isubscript𝒢𝐼\mathcal{G}_{I}caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPTdo

6:sumLPIPS += LPIPS(I,G)LPIPS𝐼𝐺\text{LPIPS}(I,G)LPIPS ( italic_I , italic_G ).

7:endfor

8:Compute average LPIPS for the set: avgLPIPS=sumLPIPS/||avgLPIPSsumLPIPS\text{avgLPIPS}=\text{sumLPIPS}/|\mathcal{I}|avgLPIPS = sumLPIPS / | caligraphic_I |.

9:Append avgLPIPS to \mathcal{M}caligraphic_M.

10:endfor

11:if||<C𝐶|\mathcal{M}|<C| caligraphic_M | < italic_Cthen

12:return 00 \triangleright Insufficient data for plateau detection

13:endif

14:fori=1𝑖1i=1italic_i = 1 to ||11|\mathcal{M}|-1| caligraphic_M | - 1do

15:Let consistent=TrueconsistentTrue\text{consistent}=\text{True}consistent = True.

16:forj=max(0,iC+1)𝑗0𝑖𝐶1j=\max(0,i-C+1)italic_j = roman_max ( 0 , italic_i - italic_C + 1 ) to i𝑖iitalic_ido

17:if|[j][j1]|θdelimited-[]𝑗delimited-[]𝑗1𝜃|\mathcal{M}[j]-\mathcal{M}[j-1]|\geq\theta| caligraphic_M [ italic_j ] - caligraphic_M [ italic_j - 1 ] | ≥ italic_θthen

18:Set consistent=FalseconsistentFalse\text{consistent}=\text{False}consistent = False and break.

19:endif

20:endfor

21:ifconsistentthen

22:return i𝑖iitalic_i \triangleright Plateau point detected, Output: P𝑃Pitalic_P

23:endif

24:endfor

25:return ||11|\mathcal{M}|-1| caligraphic_M | - 1 \triangleright No plateau detected, Output: P𝑃Pitalic_P

26:endprocedure

The significant negative correlation between LPIPS and the F1 score(-0.82), PSNR(-0.81), and SSIM(-0.69) underscore the impact of LPIPS on the quality of 3D reconstruction (see supplementary material for detailed correlation matrix). The high magnitude of these coefficients, particularly the -0.82 with the F1 score, indicates that LPIPS is a robust predictor of reconstruction accuracy: as the perceptual similarity measure improves (meaning LPIPS decreases), the fidelity of the reconstructed point cloud to the ground truth improves correspondingly. This observation not only suggests the utility of LPIPS as a stand-in metric when the ground truth is unavailable but also highlights its potential as a more influential factor than traditional metrics such as PSNR (0.58) and SSIM (0.37) in determining the overall quality of NeRF-generated reconstructions.

Given this strong correlation, LPIPS emerges as a promising surrogate metric for early stopping during NeRF training. By monitoring LPIPS, one can infer the likely F1 score and make informed decisions about halting the training process. This method could decrease computational costs and time, as one need not await the completion of full training to predict its efficacy in terms of F1 score.

Algorithm for Plateau Detection:The plateau detection algorithm identifies a stabilization point in a series of metric values, such as LPIPS. The updated algorithm computes the average Learned Perceptual Image Patch Similarity (LPIPS) for each set of images in 𝒮𝒮\mathcal{S}caligraphic_S against their corresponding ground truth images in 𝒢𝒢\mathcal{G}caligraphic_G. It then assesses the sequence of these average LPIPS values to identify a plateau, using a specified threshold θ𝜃\thetaitalic_θ and a consistency length C𝐶Citalic_C. The detection of the plateau point P𝑃Pitalic_P is crucial for indicating an optimal stopping point in the training process. To validate the efficacy of the early stopping algorithm, we applied it to a diverse dataset comprising five plant types captured in both indoor and outdoor settings. The threshold (θ𝜃\thetaitalic_θ) was set to 0.005, and the consistency length (C) was fixed at 6. The granularity of interpolation was set to 1000, spanning a total of 60000 training iterations. These hyperparameters were chosen based on empirical observations to ensure a balance between computational efficiency and reconstruction accuracy.

3Results

We evaluated the performance of NeRF models across various scenarios, from controlled indoor environments to complex outdoor field conditions, using key performance metrics to assess their efficacy in 3D plant reconstruction. The NeRFs were trained on an NVIDIA A100 GPU with 80GB GPU RAM attached to an AMD EPYC 7543 32-core CPU with 503GB CPU RAM. Post-training, the models are converted into point clouds with approximately a million points each. Estimated camera poses from COLMAP are visualized in Figure5, and a summary of the performance metrics of each of the three scenarios is given in Table2. 3D evaluation metrics are presented in this section; for a more granular analysis of 2D image metrics, please refer to Supplement. Visually, the performance of each model could be assessed using Precision and Recall as shown in Figure6. The Precision-Recall curves of the different Scenarios for different threshold values are shown in Figure7.

Visualization Color Code: The color-coded visualizations employed provide an intuitive understanding of spatial relationships within the 3D reconstructed plant structures. The interpretation of colors is as follows:

  • Grey: (Correct) Represents points within a predefined distance threshold relative to the reference point cloud. This color indicates accurate points in precision and recall evaluations, where precision assesses the reconstruction against the ground truth, and recall evaluates the ground truth against the reconstruction.

  • Red: (Missing) Depicts points in the point cloud being tested that are beyond the distance threshold but within 3 standard deviations from the nearest point in the reference point cloud. These points are considered inaccuracies, showing missing details in the reconstruction when assessing precision and highlighting missing elements in the ground truth during recall analysis.

  • Black: (Outlier) Highlights points in the point cloud being tested that are more than 3 standard deviations away from any point in the reference point cloud. These points are extreme outliers and represent significant errors in the reconstruction relative to the ground truth for precision evaluations, and similarly significant discrepancies in the ground truth relative to the reconstruction for recall.

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (18)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (19)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (20)
#ModelPrecision \uparrowRecall \uparrowF1 \uparrowPSNR \uparrowSSIM \uparrowLPIPS \downarrowTime (s) \downarrow
Instant-NGP24.6690.6238.7723.410.810.17756
ITensoRF9.5843.3415.6914.690.550.661973
NeRFacto73.5794.7282.8122.240.730.121938
Instant-NGP23.4558.5733.4919.080.640.311886
IITensoRF20.555.3429.9115.540.420.562607
NeRFacto64.4776.870.118.930.640.251226
Instant-NGP15.0659.5524.0418.540.470.41466
IIITensoRF40.9575.6253.1317.320.390.551965
NeRFacto68.2982.3274.6516.70.320.341499

3.1Scenario I - Single Plants Indoors

We first look at the results of reconstructing a single plant in an indoor environment. Detailed evolution of each metric over training iterations is given in the Supplement.

Precision: For Scenario I, NeRFacto, achieved the highest precision followed by TensoRF and Instant-NGP (see Figure6) after 30000 iterations. Across all models, precision generally increases with the number of iterations. For a detailed evaluation of the change of precision with iterations, please refer the Supplement.

{tblr}

colspec = X[0.015,c,t]X[0.29,c,t]X[0.29,c,t]X[0.29,c,t],stretch = 0,rowsep = 3pt,Precision &Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (21)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (22)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (23)
RecallEvaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (24)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (25)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (26)
Scenario I
PrecisionEvaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (27)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (28)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (29)
RecallEvaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (30)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (31)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (32)
Scenario II
PrecisionEvaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (33)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (34)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (35)
RecallEvaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (36)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (37)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (38)
Scenario III
Instant-NGP TensoRF NeRFacto

Recall: The recall metric follows a similar trend, with Instant-NGP and NeRFacto showing increases with more iterations, indicating an enhanced ability to encompass points from the ground truth. Notably, NeRFacto achieves remarkably high recall values (over 90) at higher iterations, suggesting its superiority in the completeness of reconstruction. TensoRF’s recall values are significantly lower, indicating that it may miss more details from the ground truth compared to the other models.

F1 Score: The F1 score, balancing precision and recall, highlights NeRFacto as the most balanced model, especially at higher iterations, with scores above 80. Instant-NGP shows a significant improvement in F1 scores as iterations increase, but it doesn’t reach the same peak as NeRFacto. TensoRF lags in this metric, indicating a less balanced performance between precision and recall.

Computation Time: Time efficiency is a crucial factor, especially for practical applications. Instant-NGP demonstrates a relatively balanced approach between efficiency and performance, with time increments correlating reasonably with the increase in iterations. However, it becomes time-consuming at high iterations (20000 and 30000). NeRFacto, while showing better performance in many metrics, demands considerably more time, especially at higher iterations, which could be a limiting factor in time-sensitive scenarios. The evolution of precision over training time for NeRFacto is given in the supplementary material. TensoRF, despite its lower performance in other metrics, maintains a more consistent time efficiency, suggesting its suitability for applications where time is a critical constraint. Please see the Supplement for the evolution of precision metric over the training iterations.

Overall Performance and Suitability: In sum, NeRFacto emerges as the most robust model in terms of precision, recall, F1 score, and image quality metrics (PSNR, SSIM, LPIPS), making it highly suitable for applications demanding high accuracy and completeness in 3D modeling. However, its time inefficiency at higher iterations might restrict its use in time-sensitive contexts. Instant-NGP presents a good balance between performance and efficiency, making it a viable option for moderately demanding scenarios. Detailed results are given in Table2, after complete training. For more granular look of each metric value over the training iteration for all the algorithms, consult the supplementary. The Precision-Recall curves based on varying distance threshold after maximum training of 30,000 iterations is given in Figure7.

Scenario I

# of points (%)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (39)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (40)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (41)

Threshold (m)

Scenario II

# of points (%)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (42)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (43)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (44)

Threshold (m)

Scenario III

# of points (%)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (45)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (46)

Threshold (m)

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (47)

Threshold (m)

Precision   Recall

Insight 1: Computational Cost and Accuracy Trade-off in Instant-NGP and NeRFacto: The steep increase in performance metrics with the number of iterations for both Instant-NGP and NeRFacto suggests that these models require a substantial amount of data processing to achieve high accuracy, which is critical in high-fidelity 3D modeling. However, this also implies a higher computational cost, which needs to be considered in practical applications.

Insight 2: Model Suitability in High-Detail 3D Reconstructions: The significant disparity in the performance of TensoRF compared to the other two models, particularly in precision and recall, indicates that not all NeRF models are equally suited for tasks requiring high-detail 3D reconstructions. This highlights the importance of model selection based on the specific requirements of the application.

Insight 3: Divergence in 2D Image Quality and 3D Reconstruction in Instant-NGP: A detailed examination reveals that Instant-NGP demonstrates strength in 2D image quality metrics such as PSNR, SSIM, and LPIPS, reflecting its ability to produce better rendered image quality. However, this excellence in 2D imaging does not correspondingly extend to 3D reconstruction metrics like Precision, Recall, and F1 Score. This observation highlights a significant distinction in the challenges associated with optimizing for high-quality image rendering as opposed to achieving accurate 3D representations. The model’s adeptness at rendering highly detailed 2D images does not necessarily imply its effectiveness in accurately reconstructing complex 3D structures, particularly in the context of intricate plant models. This insight underscores the need for a nuanced approach in evaluating the performance of models that are tasked with both 2D image rendering and 3D spatial reconstruction.

3.2Scenario II - Multiple Plants Indoors

We observe marked differences in model behaviors compared to the single plant scenario, likely attributed to the added intricacy of multiple plants in a single scene. Detailed evolution of each metric over training iterations is given in the Supplement.

Precision: As shown in Figure6, Instant-NGP exhibits a steady increase in precision with more iterations, peaking at a high value. However, NeRFacto starts at a higher precision and reaches an even higher peak, indicating a more accurate reconstruction of the corn plants. TensoRF, although improving with more iterations, lags behind the others in terms of precision.

Recall: A similar pattern is observed for recall, with NeRFacto consistently maintaining a higher recall compared to the other methods, suggesting its ability to better encompass points in the ground truth. Both Instant-NGP and TensoRF exhibit increasing recall with more iterations, but at lower levels than NeRFacto.

F1 Score: The F1 Score, balancing precision and recall, follows a similar trend. NeRFacto demonstrates the best balance between precision and recall, with its F1 score peaking at 70.10, while Instant-NGP and TensoRF achieve lower peak F1 scores.

Computation Time: The time taken for iterations is crucial for efficiency. Instant-NGP and NeRFacto have comparable times, but TensoRF takes significantly longer at higher iterations, indicating less time efficiency. See supplementary material for evolution of precision metric over the training iterations.

Overall Performance and Suitability: NeRFacto emerges as the most balanced and efficient model, exhibiting high precision, recall, and F1 scores, along with favorable PSNR, SSIM, and LPIPS values. Its efficiency in time taken is also comparable to Instant-NGP. Instant-NGP, while showing improvements, doesn’t quite match NeRFacto’s balance of precision and recall. TensoRF, despite its merits, falls behind in several key metrics, particularly in precision, recall, SSIM, and LPIPS. The results after complete training are given in Table2. For more granular look of each metric value over the training iteration for all the algorithms, consult the supplementary. The Precision-Recall curves based on varying distance thresholds after maximum training of 30,000 iterations are given in Figure7.

Insight 1: Improved Performance of TensoRF in Scenario II: In the second scenario, TensoRF demonstrated an improvement compared to its performance in the first scenario. Specifically, its F1 score, a critical metric for 3D modeling accuracy, increased from 15.69 in the first scenario to 29.91 after 30,000 iterations in the second scenario. This improvement highlights TensoRF’s potential in more complex or demanding 3D modeling tasks, especially when allowed to complete its training process.

Insight 2: 2D Metrics Versus 3D F1 Score for Instant-NGP and NeRFacto: While Instant-NGP and NeRFacto show comparable results in 2D image quality metrics such as PSNR and SSIM, a distinct difference is observed in their 3D modeling capabilities, as reflected in their F1 scores, as observed in last scenario. This suggests that NeRFacto might be a more reliable choice for applications requiring high accuracy in 3D reconstructions.

3.3Scenario III - Multiple Plants Outdoors

Scenario III is the most complex, with multiple overlapping plants captured in field conditions. The models were also trained until 60,000 iterations, while the previous two scenarios were trained only for 30,000 iterations. Detailed evolution of each metric over training iterations is given in the Supplement.

Precision: As observed in Figure6, NeRFacto consistently demonstrates the highest precision across all iterations, peaking at 68.29%, suggesting its ability to reconstruct points close to the ground truth. Instant-NGP shows a steady increase in precision with more iterations, while TensoRF, although starting lower, reaches a comparable precision to Instant-NGP at higher iterations.

Recall: NeRFacto leads in recall, achieving a high of 82.32%, indicating its effectiveness in encompassing points from the ground truth. Instant-NGP shows significant improvement in recall with increased iterations, but remains behind NeRFacto. TensoRF’s recall growth positions it between Instant-NGP and NeRFacto in terms of completeness.

F1 Score: Reflecting the balance between precision and recall, NeRFacto emerges as the superior model, with its F1 score peaking at 74.65%. Instant-NGP’s F1 score improves with more iterations but remains significantly lower, while TensoRF’s F1 score surpasses Instant-NGP, reaching 53.13%.

Computation Time: In terms of efficiency, Instant-NGP and NeRFacto are the fastest, followed by TensoRF.

Overall Performance and Suitability: NeRFacto again emerges as the most balanced and robust model, excelling in precision, recall, F1 score, and LPIPS. Detailed results are given in Table2, after complete training. For more granular look of each metric value over the training iteration for all the algorithms, consult the supplementary. The Precision-Recall curves based on varying distance threshold after maximum training of 60,000 iterations is given in Figure7. The GPU memory usage of this scenario comes out to be approximately a constant 3GB (for the total memory of GPU being 80GB).

Insight 1: Enhanced Performance of TensoRF in Outdoor Settings: TensoRF demonstrates significant improvement in its performance in the third scenario compared to the first. Specifically, its F1 score has seen a good increase; from 15.69 in the first scenario to 29.91 in the second, and reaching 53.13 after 30,000 iterations in the current outdoor scenario. This upward trajectory in F1 scores, which is a balanced measure of precision and recall, indicates TensoRF’s enhanced capability in outdoor environments, potentially outperforming Instant-NGP in these settings. This suggests that TensoRF might be a more suitable choice for outdoor 3D modeling tasks where both precision and completeness are crucial. This property may have contributed in the selection of TensoRF as a building block for using multiple local radiance fields, during in-the-wild reconstruction[54].

Insight 2: LPIPS as a Strong Indicator of 3D Model Quality: The LPIPS metric appears to be a more representative measure of the quality of the resulting 3D models. In the analysis, we observe that models with lower LPIPS scores consistently show better performance across other metrics. This trend indicates the relevance of LPIPS in assessing the perceptual quality of 3D models. The further investigation into how LPIPS correlates with other metrics could provide deeper insights into model performance, especially in the context of realistic and perceptually accurate 3D reconstructions.

3.4Early Stopping Algorithm

The implementation of early stopping based on the LP-IPS metric yielded substantial savings in computational time across all scenarios, with a minor sacrifice in the fidelity of 3D reconstructions, as measured by the F1 score. Time savings were notable across the three tested methodologies—Instant-NGP, TensorRF, and NeRFacto—with each showing a marked decrease in training time without a commensurate loss in F1 score accuracy. For a deeper look of LPIPS, F1 Score and the recommended stopping point for each case, consult the supplementary material.

On average, the early stopping strategy resulted in a 61.1% reduction in training time, suggesting a significant efficiency gain in the process of 3D plant reconstruction using neural radiance fields. Concurrently, the average F1 score loss was contained to 7.4%, indicating that the early plateau detection has a moderate impact on the quality of the 3D point cloud reconstructions. Specifically, Instant-NGP presented a more pronounced variation in F1 score loss, which was notably higher in Scene-III, thereby affecting its average loss more than TensorRF and NeRFacto. TensorRF and NeRFacto showed a remarkable consistency in time savings, which was mirrored in their comparable F1 score losses, highlighting the robustness of these methods in early stopping scenarios.

These findings articulate a compelling case for the utilization of early stopping in NeRF-based 3D reconstruction tasks, emphasizing the need to balance between computational resources and reconstruction precision. Such a balance is pivotal in scenarios where time efficiency is paramount yet a minimal compromise on reconstruction accuracy is permissible.

3.5Scenario IV - Validation Examples In Field Conditions

Scene-1

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (48)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (49)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (50)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (51)

Scene-2

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (52)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (53)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (54)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (55)

Scene-3

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (56)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (57)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (58)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (59)

Scene-4

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (60)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (61)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (62)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (63)

Scene-5

Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (64)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (65)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (66)
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (67)

The efficacy of the LPIPS-based early stopping algorithm was validated using a diverse dataset comprising images from five different types of plants captured in both indoor and outdoor settings, as illustrated in Figure8. The validation process employed a threshold θ𝜃\thetaitalic_θ set to 0.005 and a consistency length C𝐶Citalic_C of 6, with the granularity of interpolation fixed at 1000, spanning a total of 60000 training iterations. For practical application, checkpoints, inherently exponential in nature, necessitated linear interpolation to facilitate algorithm execution. Figure8 shows the rendered point clouds at three stages: after 1000 iterations, at the recommended early stopping iteration, and upon completing the full 60000 iterations of training. Each row of the figure corresponds to one of the five validation scenes, providing a qualitative comparative analysis.

Notably, for all indoor scenes, the algorithm recommended halting training at 20000 iterations, whereas for outdoor scenes, the suggestion extended to 30000 iterations. This distinction underscores the algorithm’s sensitivity to environmental variables affecting perceptual similarity metrics. The rendered point clouds, particularly at the early stopping points, exhibit minimal visual discrepancies when compared to those obtained after the full training duration. By reducing computational demands without much loss in fidelity, this approach is a cost-effective strategy for enhancing modeling throughput in precision agriculture and botanical research. We substantiate the hypothesis that LPIPS can serve as a reliable surrogate for direct F1 score estimation in the context of NeRF training. The algorithm’s ability to accurately predict optimal stopping points—balancing computational efficiency with reconstruction accuracy—presents a compelling case for its adoption in scenarios where resource conservation is paramount, yet quality cannot be entirely sacrificed.

4Discussion

In this section, we discuss the findings from our comparative analysis of NeRF models for 3D plant reconstruction. The results indicate that the Nerfacto model achieved the best performance, and we explore the theoretical basis for its superiority by examining the sampling strategies employed by the different models. Understanding these strategies provides insights into why Nerfacto outperformed the other models in terms of reconstruction quality. In our experiments, we found that the Nerfacto model produced the highest quality 3D reconstructions compared to Instant-NGP and other NeRF models. To understand the theoretical basis for Nerfacto’s superior performance, in this section we take a deeper look at the sampling strategies used by Nerfacto and Instant-NGP and how they influence the visual quality and level of detail in the rendered scenes.

The divergent performance of the NeRF models necessitates a deeper examination of their underlying sampling strategies and their influence on the quality of 3D reconstruction. The difference in the output quality between Instant-NGP and Nerfacto, especially concerning the density and crispness of the rendered scenes, could indeed be related to the sampling strategies used by each algorithm.

Instant-NGP Sampling Strategy: Instant-NGP uses an improved training and rendering algorithm that involves a ray marching scheme with an occupancy grid. This means that when the algorithm shoots rays into the scene to sample colors and densities, it uses an occupancy grid to skip over empty space, as well as areas behind high-density regions to improve efficiency.

The occupancy grid used in Instant-NGP is a multiscale grid that coarsely marks empty and non-empty space and is used to determine where to skip samples to speed up processing. This approach is quite effective in terms of speed, leading to significant improvements over naive sampling methods. However, if the occupancy grid isn’t fine-grained enough or if the method for updating this grid isn’t capturing the scene’s density variations accurately, it could lead to a “muddy” or overly dense rendering because it might not be sampling the necessary areas with enough precision.

NeRFacto Sampling Strategy: Nerfacto, on the other hand, uses a combination of different sampling techniques:

  • Camera Pose Refinement: By refining camera poses, Nerfacto ensures that the samples taken are based on more accurate viewpoints, which directly affects the clarity of the rendered images.

  • Piecewise Sampler: This sampler is used to produce an initial set of samples, with a distribution that allows both dense sampling near the camera and appropriate sampling further away. This could lead to clearer images since it captures details both near and far from the camera.

  • Proposal Sampler: This is a key part of the Nerfacto method. It uses a proposal network to concentrate sample locations in regions that contribute most to the final render, usually around the first surface intersection. This targeted sampling could be a major reason why Nerfacto produces crisper images—it focuses computational resources on the most visually significant parts of the scene.

  • Density Field: By using a density field guided by a hash encoding and a small fused MLP, Nerfacto can efficiently guide sampling even further. It doesn’t require an extremely detailed density map since it is used primarily for guiding the sampling process, which means that it balances quality and speed without necessarily impacting the final image’s detail.

Instant-NGP’s sampling strategy is built for speed, with an occupancy grid that helps skip irrelevant samples. This approach is great for real-time applications but can potentially miss subtle density variations, leading to a denser and less clear output if the grid isn’t capturing all the necessary detail.Nerfacto’s sampling strategy is more complex and layered, with multiple mechanisms in place to ensure that sampling is done more effectively in areas that greatly affect the visual output. The combination of pose refinement, piecewise sampling, proposal sampling, and an efficient density field leads to more accurate sampling, which in turn produces crisper images.In summary, the reason for Nerfacto’s better reconstruction likely stems from its more refined and targeted approach to sampling, which concentrates computational efforts on the most visually impactful parts of the scene. In contrast, Instant-NGP’s faster but less targeted sampling may result in less clarity and more visual artifacts.

Finally, to retrieve the scale of the 3D reconstruction in the absence of reference point cloud data, a known scale can be placed on the ground during data collection. The exported point cloud can then be proportionally scaled based on this reference scale, which allows the size of the reconstructed plant to be calibrated to match its real-world dimensions. In order to show the practicality of this approach, we placed a 3D printed sphere of known diameter for the plant in Scenario I and captured the images. We then go through our pipeline of NeRF reconstruction, and instead of registering and scaling the scene to the ground truth LiDAR data, we scale it to the known sphere size. We then measured the height of the plant in this scenario. We find that by using this approach, the error in the height of the plant is within 1%. We provide additional details of this experiment in the Supplement. We note that this is a preliminary result, and more detailed studies need to be performed in the future on extracting the correct scale from NeRF reconstructions.

5Conclusions

The findings of this research underscore the value of NeRFs as a non-destructive approach for 3D plant reconstruction in precision agriculture. Our methodology more effectively facilitates critical agricultural tasks, such as growth monitoring, yield prediction, and early disease detection from accurate reconstruction of plant structures. Our comparative analysis, which benchmarks different NeRF models against ground truth data, highlights the method’s efficiency, achieving a 74.65% F1 score within 30 minutes of GPU training. Introducing an early stopping algorithm based on LPIPS further enhances this process, reducing training time by 61.1% while limiting the average F1 score loss to just 7.4%.

Additionally, our work provides a comprehensive dataset and an evaluation framework, aiding the validation of current models and serving as a foundation for developing future NeRF applications in agriculture. The detailed insights into model performance across varied scenarios, coupled with the early stopping case study, offer practical guidance for 3D reconstruction using NeRFs. This research supports the advancement of non-intrusive agricultural technologies and also sets a baseline for future work at the intersection of NeRF technologies and agriculture, aiming to improve efficiency and accuracy in plant phenotyping and breeding.

Acknowledgements

This work was supported in part by the Plant Science Institute at Iowa State University, the National Science Foundation under grant number OAC:1750865, and the National Institute of Food and Agriculture (USDA-NIFA) as part of the AI Institute for Resilient Agriculture (AIIRA), grant number 2021-67021-35329.

Supplementary Materials

Figures S1 to S9.
Tables S1 to S5.

Author Contributions

Muhammad Arbab Arshad: Methodology, Formal analysis, Software, Writing–Original Draft, Data Curation, Visualization.
Talukder Jubery:Data Curation, Supervision, Writing–Review & Editing.
James Afful:Data Curation.
Anushrut Jignasu:Data Curation.
Aditya Balu:Methodology, Supervision, Writing–Review & Editing.
Baskar Ganapathysubramanian:Conceptualization, Methodology, Supervision, Project administration.
Soumik Sarkar:Conceptualization, Methodology, Project administration, Funding acquisition.
Adarsh Krishnamurthy:Conceptualization, Methodology, Supervision, Writing–Review & Editing, Project administration, Funding acquisition.

Data Availability

The data for the four Scenarios including raw images and the point cloud will be made available online. The GIT repository containing the different implementations of the NeRF code will also be made public.

References

  • Pereira [2017]LuisSantos Pereira.Water, agriculture and food: challenges and issues.Water Resources Management, 31(10):2985–2999, 2017.
  • Kumar etal. [2012]Pankaj Kumar, Jinhai Cai, and Stan Miklavcic.High-throughput 3D modelling of plants for phenotypic analysis.In Proceedings of the 27th conference on image and vision computing New Zealand, pages 301–306, 2012.
  • Paturkar etal. [2021a]Abhipray Paturkar, Gourab SenGupta, and Donald Bailey.Making use of 3D models for plant physiognomic analysis: A review.Remote Sensing, 13(11):2232, 2021a.
  • Feng etal. [2023]Jiale Feng, Mojdeh Saadati, Talukder Jubery, Anushrut Jignasu, Aditya Balu, Yawei Li, Lakshmi Attigala, PatrickS Schnable, Soumik Sarkar, Baskar Ganapathysubramanian, etal.3D reconstruction of plants using probabilistic voxel carving.Computers and Electronics in Agriculture, 213:108248, 2023.
  • Cuevas-Velasquez etal. [2020]Hanz Cuevas-Velasquez, Antonio-Javier Gallego, and RobertB Fisher.Segmentation and 3D reconstruction of rose plants from stereoscopic images.Computers and electronics in agriculture, 171:105296, 2020.
  • Sarkar etal. [2023]Soumik Sarkar, Baskar Ganapathysubramanian, Arti Singh, Fateme Fotouhi, Soumyashree Kar, Koushik Nagasubramanian, Girish Chowdhary, SajalK Das, George Kantor, Adarsh Krishnamurthy, Nirav Merchant, and AsheeshK. Singh.Cyber-agricultural systems for crop breeding and sustainable production.Trends in Plant Science, 2023.
  • Eltner and Sofia [2020]Anette Eltner and Giulia Sofia.Structure from motion photogrammetric technique.In Developments in Earth surface processes, volume23, pages 1–24. Elsevier, 2020.
  • Chen etal. [2019]Rui Chen, Songfang Han, Jing Xu, and Hao Su.Point-based multi-view stereo network.In Proceedings of the IEEE/CVF international conference on computer vision, pages 1538–1547, 2019.
  • Wu etal. [2020]Sheng Wu, Weiliang Wen, Yongjian Wang, Jiangchuan Fan, Chuanyu Wang, Wenbo Gou, and Xinyu Guo.MVS-Pheno: A portable and low-cost phenotyping platform for Maize shoots using multiview stereo 3D reconstruction.Plant Phenomics, 2020.
  • Wang etal. [2022]Yinghua Wang, Songtao Hu, HeRen, Wanneng Yang, and Ruifang Zhai.3DPhenoMVS: A low-cost 3D tomato phenotyping pipeline using 3D reconstruction point cloud based on multiview images.Agronomy, 12(8):1865, 2022.
  • Lu etal. [2009]Shenglian Lu, Chunjiang Zhao, Xinyu Guo, etal.Venation skeleton-based modeling plant leaf wilting.International Journal of Computer Games Technology, 2009, 2009.
  • Evers [2011]JBEvers.3D modelling of branching in plants.In Proceedings of the MODSIM2011, 19th International Congress on Modelling and Simulation, 12-16 December 2011, Perth, Australia, pages 982–988, 2011.
  • Andújar etal. [2018]Dionisio Andújar, Mikel Calle, César Fernández-Quintanilla, Ángela Ribeiro, and José Dorado.Three-dimensional modeling of weed plants using low-cost photogrammetry.Sensors, 18(4):1077, 2018.
  • Paturkar etal. [2019]Abhipray Paturkar, GaurabSen Gupta, and Donald Bailey.3D reconstruction of plants under outdoor conditions using image-based computer vision.In Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2, pages 284–297. Springer, 2019.
  • Lu [2023]Guoyu Lu.Bird-view 3D reconstruction for crops with repeated textures.In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4263–4270. IEEE, 2023.
  • Paturkar etal. [2021b]Abhipray Paturkar, GourabSen Gupta, and Donald Bailey.Effect on quality of 3D model of plant with change in number and resolution of images used: An investigation.In Advances in Signal and Data Processing: Select Proceedings of ICSDP 2019, pages 377–388. Springer, 2021b.
  • Liénard etal. [2016]Jean Liénard, Andre Vogs, Demetrios Gatziolis, and Nikolay Strigul.Embedded, real-time UAV control for improved, image-based 3D scene reconstruction.Measurement, 81:264–269, 2016.
  • Mildenhall etal. [2021]Ben Mildenhall, PratulP Srinivasan, Matthew Tancik, JonathanT Barron, Ravi Ramamoorthi, and Ren Ng.NeRF: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021.
  • Deng etal. [2022]Nianchen Deng, Zhenyi He, Jiannan Ye, Budmonde Duinkharjav, Praneeth Chakravarthula, Xubo Yang, and QiSun.Fov-NeRF: Foveated neural radiance fields for virtual reality.IEEE Transactions on Visualization and Computer Graphics, 28(11):3854–3864, 2022.
  • Tancik etal. [2022]Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, PratulP Srinivasan, JonathanT Barron, and Henrik Kretzschmar.Block-NeRF: Scalable large scene neural view synthesis.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
  • Nguyen etal. [2015]ThuyTuong Nguyen, DavidC Slaughter, Nelson Max, JulinN Maloof, and Neelima Sinha.Structured light-based 3D reconstruction system for plants.Sensors, 15(8):18587–18612, 2015.
  • Tang etal. [2022]Xiaoying Tang, Mengjun Wang, Qian Wang, Jingjing Guo, and Jingxiao Zhang.Benefits of terrestrial laser scanning for construction qa/qc: a time and cost analysis.Journal of Management in Engineering, 38(2):1–10, 2022.
  • Jignasu etal. [2023]Anushrut Jignasu, Ethan Herron, TalukderZaki Jubery, James Afful, Aditya Balu, Baskar Ganapathysubramanian, Soumik Sarkar, and Adarsh Krishnamurthy.Plant geometry reconstruction from field data using neural radiance fields.In 2nd AAAI Workshop on AI for Agriculture and Food Systems, 2023.
  • PolyCam [2023]PolyCam.Polycam - lidar and 3D scanner, 2023.URL https://poly.cam/.Accessed: 2024-03-01.
  • Schonberger and Frahm [2016]JohannesL Schonberger and Jan-Michael Frahm.Structure-from-motion revisited.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  • Müller etal. [2022]Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller.Instant neural graphics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  • Chen etal. [2022]Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su.TensoRF: Tensorial radiance fields.In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  • Tancik etal. [2023]Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, etal.NeRFStudio: A modular framework for neural radiance field development.In ACM SIGGRAPH Conference Proceedings, pages 1–12, 2023.
  • Barron etal. [2022]JonathanT Barron, Ben Mildenhall, Dor Verbin, PratulP Srinivasan, and Peter Hedman.Mip-nerf 360: Unbounded anti-aliased neural radiance fields.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  • Barron etal. [2021]JonathanT Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and PratulP Srinivasan.Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  • Azzarelli etal. [2023]Adrian Azzarelli, Nantheera Anantrasirichai, and DavidR Bull.Towards a robust framework for NeRF evaluation.arXiv preprint arXiv:2305.18079, 2023.
  • Radl etal. [2024]Lukas Radl, Andreas Kurz, Michael Steiner, and Markus Steinberger.Analyzing the internals of neural radiance fields.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2822–2831, 2024.
  • Li etal. [2023]Sicheng Li, Hao Li, Yue Wang, Yiyi Liao, and LuYu.SteerNeRF: Accelerating NeRF rendering via smooth viewpoint trajectory.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20701–20711, 2023.
  • Remondino etal. [2023]Fabio Remondino, Ali Karami, Ziyang Yan, Gabriele Mazzacca, Simone Rigon, and Rongjun Qin.A critical analysis of NeRF-based 3D reconstruction.Remote Sensing, 15(14):3585, 2023.
  • Balloni etal. [2023]EBalloni, LGorgoglione, MPaolanti, AMancini, and RPierdicca.Few shot photogrametry: A comparison between NeRF and MVS-SfM for the documentation of cultural heritage.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 48:155–162, 2023.
  • Liu etal. [2020]Lingjie Liu, Jiatao Gu, Kyaw ZawLin, Tat-Seng Chua, and Christian Theobalt.Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  • Reiser etal. [2021]Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger.KiloNeRF: Speeding up neural radiance fields with thousands of tiny MLPs.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
  • Yu etal. [2021]Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa.PlenOctrees for real-time rendering of neural radiance fields.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  • Wu etal. [2022]Liwen Wu, JaeYong Lee, Anand Bhattad, Yu-Xiong Wang, and David Forsyth.Diver: Real-time and accurate neural radiance fields with deterministic integration for volume rendering.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16200–16209, 2022.
  • Yu etal. [2022]Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, and Andreas Geiger.Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction.Advances in Neural Information Processing Systems, 35:25018–25032, 2022.
  • Yariv etal. [2021]Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman.Volume rendering of neural implicit surfaces.Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
  • Wang etal. [2021]Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang.NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction.In Advances in Neural Information Processing Systems, volume34, pages 27171–27183, 2021.
  • Oechsle etal. [2021]Michael Oechsle, Songyou Peng, and Andreas Geiger.Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5589–5599, 2021.
  • Knapitsch etal. [2017]Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun.Tanks and Temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  • Besl and McKay [1992]PaulJ Besl and NeilD McKay.Method for registration of 3D shapes.In Sensor fusion IV: control paradigms and data structures, volume 1611, pages 586–606. Spie, 1992.
  • Zhang [1994]Zhengyou Zhang.Iterative point matching for registration of free-form curves and surfaces.International journal of computer vision, 13(2):119–152, 1994.
  • Billings etal. [2015]SethD Billings, EmadM Boctor, and RussellH Taylor.Iterative most-likely point registration (imlp): A robust algorithm for computing optimal shape alignment.PloS One, 10(3):e0117688, 2015.
  • Mazzacca etal. [2023]GMazzacca, AKarami, SRigon, EMFarella, PTrybala, and FRemondino.NeRF for heritage 3D reconstruction.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 48:1051–1058, 2023.
  • Zhang etal. [2018]Richard Zhang, Phillip Isola, AlexeiA Efros, Eli Shechtman, and Oliver Wang.The unreasonable effectiveness of deep features as a perceptual metric.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  • Hore and Ziou [2010]Alain Hore and Djemel Ziou.Image quality metrics: PSNR vs. SSIM.In International Conference on Pattern Recognition, pages 2366–2369. IEEE, 2010.
  • Wang etal. [2004]Zhou Wang, AlanC Bovik, HamidR Sheikh, and EeroP Simoncelli.Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004.
  • Xu etal. [2022]Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann.Point-NeRF: Point-based neural radiance fields.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  • Zhang etal. [2021]Jason Zhang, Gengshan Yang, Shubham Tulsiani, and Deva Ramanan.NeRS: Neural reflectance surfaces for sparse-view 3D reconstruction in the wild.Advances in Neural Information Processing Systems, 34:1–12, 2021.
  • Meuleman etal. [2023]Andreas Meuleman, Yu-Lun Liu, Chen Gao, Jia-Bin Huang, Changil Kim, MinH Kim, and Johannes Kopf.Progressively optimized local radiance fields for robust view synthesis.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16539–16548, 2023.
Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field Conditions (2024)
Top Articles
Prix d’une fenêtre triple vitrage
The Pittsburgh Press from Pittsburgh, Pennsylvania
Overton Funeral Home Waterloo Iowa
Unblocked Games Premium Worlds Hardest Game
The Potter Enterprise from Coudersport, Pennsylvania
Snarky Tea Net Worth 2022
True Statement About A Crown Dependency Crossword
Lantana Blocc Compton Crips
Knaben Pirate Download
Non Sequitur
Craigslist Farm And Garden Cincinnati Ohio
9044906381
Pizza Hut In Dinuba
Mikayla Campinos Laek: The Rising Star Of Social Media
Gayla Glenn Harris County Texas Update
Evil Dead Rise Showtimes Near Regal Sawgrass & Imax
Great Clips Grandview Station Marion Reviews
Tips and Walkthrough: Candy Crush Level 9795
12 Facts About John J. McCloy: The 20th Century’s Most Powerful American?
Prey For The Devil Showtimes Near Ontario Luxe Reel Theatre
Craigslist Rome Ny
Ascensionpress Com Login
Evil Dead Rise Ending Explained
Pioneer Library Overdrive
In hunt for cartel hitmen, Texas Ranger's biggest obstacle may be the border itself (2024)
Downloahub
Barbie Showtimes Near Lucas Cinemas Albertville
Earthy Fuel Crossword
Ripsi Terzian Instagram
Aladtec Login Denver Health
Pokemmo Level Caps
Skip The Games Ventura
Are you ready for some football? Zag Alum Justin Lange Forges Career in NFL
Soulstone Survivors Igg
My.lifeway.come/Redeem
Stanford Medicine scientists pinpoint COVID-19 virus’s entry and exit ports inside our noses
Kornerstone Funeral Tulia
Hometown Pizza Sheridan Menu
This 85-year-old mom co-signed her daughter's student loan years ago. Now she fears the lender may take her house
Chathuram Movie Download
Best GoMovies Alternatives
Bill Manser Net Worth
3 Zodiac Signs Whose Wishes Come True After The Pisces Moon On September 16
فیلم گارد ساحلی زیرنویس فارسی بدون سانسور تاینی موویز
844 386 9815
What is a lifetime maximum benefit? | healthinsurance.org
Zeeks Pizza Calories
Iron Drop Cafe
About us | DELTA Fiber
Besoldungstabellen | Niedersächsisches Landesamt für Bezüge und Versorgung (NLBV)
Vrca File Converter
Pauline Frommer's Paris 2007 (Pauline Frommer Guides) - SILO.PUB
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 5924

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.