Image
Caption

A high-angle wide view of a mountain range seen from the vantage point of an airplane. A river is extending from the bottom left side of the image toward the right side of the image. There are smaller mountains near the bottom right side of the image in front of the river. Behind the river are much taller mountains. There is a large mountain in the distance near the top right side of the image. The sky is visible extending across the very top of the image. There is a layer of gray fog over the mountains, the fog around the taller mountain in the distance is thicker.

Scene Graph
    Scene Graph
QA + verification programs
Q: What kind of view does the image provide of the mountain range?
A: The image provides a high-angle wide view of the mountain range, seen from an airplane.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'mountain range' in entity:                                      
  3             return scene_graph.get_attributes(entity).get('state', '')      
  4     return None                                                             

gpt-4o: The image provides an aerial view of a mountain range. The perspective is from above, likely taken from an airplane. The mountains appear rugged and expansive, with a mix of peaks and valleys. There is also a body of water visible in the middle of the image, which adds to the scenic quality of the landscape. The overall atmosphere is somewhat hazy, giving the scene a soft, distant look.
hscore: 67.52%
• <state (view, wide)> == <state (perspective, aerial)> (70.0%)
• <state (view, high> == <state (perspective, from above)> (70.0%)
• <spatial (airplane, mountain range, from)> == <spatial (body of water, mountain range, in)"> (70.0%)
tscore: 79.62%
• <whole (mountain range)> == <whole (mountain range)> (100.0%)
• <state (mountain range, expansive)> == <state (mountain range, high> (80.0%)
• <spatial (body of water, mountain range, in)"> == <spatial (mountain, distance, in)> (80.0%)
Image
Caption

An indoor close-up front view of a grey tabby cat wearing a collar with a silver ring, inside a gray wooden cage with white steel rods and five strings on each side. The cage is placed on a brown hardwood floor, where the light and shadows of the cat and cage are reflected. Behind the cage is a white door with a cream-colored wall and white floorboard to its side.

Scene Graph
    Scene Graph
QA + verification programs
Q: Where is the grey tabby cat located in the scene?
A: The grey tabby cat is located inside a grey wooden cage.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'cat' in entity and 'grey' in                                    
    scene_graph.get_attributes(entity).get('color', ''):                        
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'cage' in relations and 'inside' in                          
    relations['cage']['spatial']:                                               
  5                 return "inside a grey wooden cage"                          
  6     return None                                                             
  7                                                                             

gpt-4o: The grey tabby cat is located inside a white or light grey structure that appears to be a small enclosure or gate. The cat is sitting on a wooden floor, and the structure is positioned in front of a white door.
hscore: 76.74%
• <color (cat, grey tabby)> == <color (cat, grey tabby)> (100.0%)
• <spatial (cat, cage, inside)> == <spatial (cat, structure, inside)> (80.0%)
• <material (cage, wooden)> == <material (floor, wooden)> (60.0%)
tscore: 86.25%
• <whole (door)> == <whole (door)> (100.0%)
• <whole (cat)> == <whole (cat)> (100.0%)
• <color (cat, grey tabby)> == <color (cat, grey tabby)> (100.0%)

Q: What is the texture and color of the floor beneath the cage?
A: The floor beneath the cage is brown and made of hardwood.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'floor' in entity:                                               
  3             attrs = scene_graph.get_attributes(entity)                      
  4             if 'brown' in attrs.get('color', '') and 'hardwood' in          
    attrs.get('material', ''):                                                  
  5                 return 'The floor beneath the cage is ' + attrs['color'] + '
    and made of ' + attrs['material']                                           
  6     return None                                                             
  7                                                                             

gpt-4o: The floor beneath the cage has a wooden texture and is light brown in color.
hscore: 78.79%
• <color (floor, brown)> == <color (floor, light brown)"> (90.0%)
• <material (floor, hardwood)> == <material (floor, wooden)> (90.0%)
• <spatial (floor, cage, beneath)> == <whole (floor)> (40.0%)
tscore: 91.22%
• <whole (floor)> == <whole (floor)> (100.0%)
• <material (floor, wooden)> == <material (floor, hardwood)> (90.0%)
• <color (floor, light brown)"> == <color (floor, brown)> (90.0%)

Q: Can you describe the rods on the cage?
A: The rods on the cage are made of steel and are white in color.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'rods' in entity:                                                
  3             attrs = scene_graph.get_attributes(entity)                      
  4             return 'The rods on the cage are made of ' + attrs['material'] +
    ' and are ' + attrs['color'] + ' in color'                                  
  5     return None                                                             
  6                                                                             

gpt-4o: The rods on the cage appear to be made of a white, rope-like material. They are vertically aligned and evenly spaced, creating a barrier that encloses the cat within the cage. The frame of the cage is constructed from a solid material, likely wood or metal, and is painted in a light gray color. The overall design is simple and functional, providing a secure enclosure for the cat.
hscore: 85.74%
• <part (cage's rods)> == <part (cage's rods)> (100.0%)
• <material (cage's rods, steel)> == <material (cage's rods, white rope> (80.0%)
• <color (cage's rods, white)> == <material (cage's rods, white rope> (70.0%)
tscore: 77.92%
• <whole (cage)> == <whole (cage)> (100.0%)
• <color (cage's frame, light gray)> == <color (cage, gray)> (90.0%)
• <likely material (cage's frame, wood or metal)> == <material (cage, wood)> (80.0%)
Image
Caption

A view of a gray concrete block that is sitting on a multicolored rock ground. White cement is visible in between the different rocks. The cement block is in the middle and has two open squares in the middle of it. Behind it, the ground is covered in small stones on the left and right. On the right, there are some patches of dried grass and green weeds. Two black cables are on the ground, going across. Behind the cables is another gray concrete block. Part of one is visible on the left. A gray piece of concrete is in between them in the dirt and stones.

Scene Graph
    Scene Graph
QA + verification programs
Q: What material can be seen in between the rocks?
A: White cement can be seen in between the rocks.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'cement' in entity:                                              
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'rocks' in relations and 'in between' in                     
    relations['rocks'].get('spatial', ''):                                      
  5                 return scene_graph.get_attributes(entity).get('color', '') +
    ' cement'                                                                   
  6     return None                                                             
  7                                                                             

gpt-4o: The material seen between the rocks in the image is mortar. Mortar is commonly used in masonry to bind stones, bricks, or concrete blocks together and fill the gaps between them.
hscore: 65.86%
• <whole (cement)> == <whole (mortar)> (80.0%)
• <spatial (cement, rocks, between)> == <spatial (mortar, between rocks)> (80.0%)
• <state (cement, visible)> == <spatial (mortar, between rocks)> (50.0%)
tscore: 83.15%
• <whole (rocks)> == <whole (rocks)> (100.0%)
• <spatial (mortar, between rocks)> == <spatial (cement, rocks, in between)> (80.0%)
• <whole (mortar)> == <whole (cement)> (80.0%)

Q: Describe the entities on the ground over which the cables go across.
A: The black cables go across the ground covered in small stones and concrete.
   0 def execute_command(scene_graph):                                          
   1     for entity in scene_graph.get_entities():                              
   2         if 'cables' in entity:                                             
   3             relations = scene_graph.get_outgoing_relations(entity)         
   4             if 'ground' in relations and 'on' in                           
     relations['ground'].get('spatial', ''):                                    
   5                 return ' and '.join([                                      
   6                     scene_graph.describe(scene_graph.generate_subgraph([sub
     _entity]))                                                                 
   7                     for sub_entity in ['stones', 'concrete']               
   8                 ]) + ' over which black cables go across'                  
   9     return None                                                            
  10                                                                            

gpt-4o: The image shows a few entities on the ground over which the cables go across: 1. **Concrete Blocks**: There are several hollow concrete blocks, some of which are stacked on top of each other. These blocks have rectangular openings and are typically used in construction. 2. **Gravel**: The ground is covered with small gravel stones, providing a rough and uneven surface. 3. **Rocks**: There are larger rocks and stones scattered among the gravel. 4. **Vegetation**: Some small patches of grass and weeds are visible, growing among the gravel and rocks. 5. **Stone Wall**: At the bottom of the image, there is a stone wall made of larger, irregularly shaped stones. This wall appears to be part of a retaining structure. The cables are laid across this mixed terrain of concrete blocks, gravel, rocks, and vegetation.
hscore: 75.61%
• <whole (cables)> == <whole (cables)> (100.0%)
• <spatial (cables, ground, across)> == <spatial (cables, ground, across)> (100.0%)
• <state (ground, covered in small stones)> == <state (rocks, larger)> (70.0%)
tscore: 76.24%
• <whole (rocks)> == <whole (rocks)> (100.0%)
• <whole (cables)> == <whole (cables)> (100.0%)
• <spatial (cables, ground, across)> == <spatial (cables, ground, on)> (90.0%)
Image
Caption

Side-view image of a black Hot Wheels Bone Shaker monster truck on a concrete floor. The truck is from a landing position after a jump with its back wheels off the ground. The truck has a black body panel with an orange under cage and decals of a black and white skull and orange/yellow flames, and black tires with orange inner rims. Behind the truck is a wall of baby-blue tarp with a black and white checkered line, in between two posters of the Hot Wheels logo, partially obscured by the top frame.

Scene Graph
    Scene Graph
QA + verification programs
Q: What colors are the tires of the monster truck on the concrete floor?
A: The tires of the monster truck are black with orange inner rims.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'tires' in entity:                                               
  3             attributes = scene_graph.get_attributes(entity)                 
  4             if 'truck' in scene_graph.get_outgoing_relations(entity):       
  5                 colors = attributes.get('color', '')                        
  6                 return colors.replace(',', ' and ')                         
  7     return None                                                             
  8                                                                             

gpt-4o: The tires of the monster truck are black with red rims.
hscore: 79.14%
• <color (monster truck's tires, black)> == <color (tires, black)> (80.0%)
• <part (monster truck's inner rims)> == <part (rims)> (80.0%)
• <color (monster truck's inner rims, orange)> == <color (rims, red)"> (60.0%)
tscore: 85.16%
• <color (tires, black)> == <color (truck's tires, black)> (90.0%)
• <part (tires)> == <part (truck's tires)> (80.0%)
• <part (rims)> == <part (truck's inner rims)> (80.0%)

Q: Is there anything between the black and white checkered line and the tarp?
A: Yes, there are two posters of the Hot Wheels logo between the black and white checkered line and the tarp.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'posters' in entity:                                             
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'line' in relations and 'tarp' in relations:                 
  5                 return scene_graph.get_attributes(entity).get('type', '') + 
    ' are between the line and the tarp.'                                       
  6     return None                                                             
  7                                                                             

gpt-4o: No, there doesn't appear to be anything between the black and white checkered line and the tarp in the image. The area between them looks clear.
hscore: 42.58%
• <whole (posters)> == <whole (building)> (60.0%)
• <spatial (posters, black and white checkered line, between)> == <part (black and white checkered line)> (50.0%)
• <spatial (posters, tarp, between)> == <part (tarp)"> (40.0%)
tscore: 55.43%
• <whole (building)> == <whole (wall)> (70.0%)
• <part (text)> == <image> (70.0%)
• <part (tarp)"> == <whole (tarp)> (70.0%)

Q: What is on the wall behind the truck?
A: The wall behind the truck has a baby-blue tarp with a black and white checkered line.
   0 def execute_command(scene_graph):                                          
   1     for entity in scene_graph.get_entities():                              
   2         if 'wall' in entity:                                               
   3             relations = scene_graph.get_outgoing_relations(entity)         
   4             for target, rel in relations.items():                          
   5                 if target == 'tarp' and 'spatial' in rel:                  
   6                     description =                                          
     scene_graph.get_attributes(target).get('color', '') + ' tarp'              
   7                     line_attr = scene_graph.get_attributes('line')         
   8                     if 'line' in relations:                                
   9                         description += ' with a ' + line_attr.get('color', 
     '') + ' line'                                                              
  10                     return description                                     
  11     return None                                                            
  12                                                                            

gpt-4o: The wall behind the truck has the "Hot Wheels" logo on it. The logo is red and yellow with a flame design, and it appears twice on the wall. There is also a black and white checkered pattern between the two logos.
hscore: 52.48%
• <pattern (tarp's design, checkered line)> == <color (checkered pattern, black)> (60.0%)
• <color (tarp's design, black)> == <color (checkered pattern, black)> (60.0%)
• <color (tarp's design, white)> == <color (checkered pattern, white)> (50.0%)
tscore: 65.57%
• <whole (wall)> == <whole (wall)> (100.0%)
• <color (checkered pattern, white)> == <color (line, black and white checkered)> (80.0%)
• <color (checkered pattern, black)> == <color (line, black and white checkered)> (80.0%)

Q: Can you describe the state of the truck during the jump?
A: During the jump, the truck's back wheels are off the ground, and it is in a landing position.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'back wheels' in entity:                                         
  3             state = scene_graph.get_attributes(entity).get('state', '')     
  4             if 'truck' in scene_graph.get_outgoing_relations(entity):       
  5                 return "The truck is {} and its back wheels are             
    {}".format(scene_graph.get_attributes('truck').get('state', ''), state)     
  6     return None                                                             
  7                                                                             

gpt-4o: The truck in the image appears to be a monster truck, and it is captured mid-jump. The front wheels are elevated off the ground, indicating that the truck is airborne. The rear wheels are still in contact with the ground, suggesting that the truck is in the process of taking off. The truck's suspension is visibly extended, and the body is tilted slightly upward. The background shows a "Hot Wheels" logo, indicating that this might be part of a stunt or a show. The truck's design features a black body with a prominent flame and skull graphic on the side.
hscore: 87.68%
• <whole (truck)> == <whole (truck)> (100.0%)
• <spatial (truck's back wheels, ground, off)> == <spatial (truck, ground, elevated)> (80.0%)
• <state (truck, in a jump)> == <state (truck, taking off)> (80.0%)
tscore: 79.43%
• <whole (truck)> == <whole (truck)> (100.0%)
• <state (truck, airborne)> == <state (truck, jump)> (80.0%)
• <spatial (truck, background, in)"> == <spatial (truck, wall, in front of)> (80.0%)
Image
Caption

An outdoor medium close-up front view of home with a gazebo type entrance, the gazebo is completely covered with vertical vines that have green leaves attached to them. The gazebo has a pointed roof and is made up of dark colored shingles, along with the house's roof. The house walls are also partially covered in hanging vines that have green leaves attached to them. The ground floor towards the bottom consists of small green plants and dried up leaves scattered throughout the surface. Above the home and behind it, are several tall trees with green leaves attached to their branches. The bright white sky can be seen above the trees and in between their branches.

Scene Graph
    Scene Graph
QA + verification programs
Q: How are the vines oriented on the gazebo?
A: The vines are vertically oriented on the gazebo and completely cover it.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'vines' in entity:                                               
  3             outgoing_relations = scene_graph.get_outgoing_relations(entity) 
  4             if 'gazebo' in outgoing_relations and 'attached to' in          
    outgoing_relations['gazebo'].get('spatial', ''):                            
  5                 return scene_graph.get_attributes(entity).get('orientation',
    '')                                                                         
  6     return None                                                             
  7                                                                             

gpt-4o: The vines on the structure are oriented in a way that they cover a significant portion of the building. They are growing from the ground up, climbing the walls, and extending over the roof. The vines appear to be dense and thick, particularly around the entrance and the roof, creating a cascading effect. The growth is quite extensive, with the vines spreading outwards and upwards, covering windows and other parts of the structure.
hscore: 78.8%
• <spatial (vines, gazebo, cover)> == <spatial (vines, windows, covering)> (90.0%)
• <part (vines)> == <whole (vines)> (80.0%)
• <orientation (vines, vertically)> == <spatial (vines, ground, from)> (60.0%)
tscore: 86.06%
• <whole (vines)> == <whole (vines)> (100.0%)
• <spatial (vines, building, climbing)> == <spatial (vines, gazebo, attached to)> (80.0%)
• <state (vines, extensive)> == <state (vines, hang)> (80.0%)

Q: What color is the sky in the scene above the trees?
A: The sky above the trees is bright white.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'sky' in entity:                                                 
  3             spatial_relations = scene_graph.get_outgoing_relations(entity)  
  4             if 'trees' in spatial_relations and 'above' in                  
    spatial_relations['trees'].get('spatial', ''):                              
  5                 return scene_graph.get_attributes(entity).get('color', '')  
  6     return None                                                             
  7                                                                             

gpt-4o: The sky in the scene above the trees appears to be overcast or cloudy, as it is a light gray color.
hscore: 62.12%
• <color (sky, bright white)> == <color (sky, light gray)"> (70.0%)
• <spatial (sky, above the trees, above)> == <state (sky, overcast)> (50.0%)
tscore: 63.94%
• <color (sky, light gray)"> == <color (sky, bright white)> (70.0%)
• <state (sky, overcast)> == <color (sky, bright white)> (60.0%)
• <state (sky, cloudy)> == <color (sky, bright white)> (50.0%)

Q: What significant feature distinguishes the gazebo's roof?
A: The gazebo's roof is characterized by its pointed shape.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "gazebo's roof" in entity:                                       
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('shape', '')                              
  5     return None                                                             
  6                                                                             

gpt-4o: The significant feature that distinguishes the roof of the gazebo in the image is the extensive growth of vines and vegetation covering it. The vines have grown thickly over the roof, creating a natural, overgrown appearance.
hscore: 74.96%
• <shape (gazebo's roof, pointed)> == <part (gazebo's roof)> (70.0%)
tscore: 86.76%
• <whole (gazebo)> == <whole (gazebo)> (100.0%)
• <part (gazebo's roof)> == <part (gazebo's roof)> (100.0%)
• <whole (vines and vegetation)> == <whole (vines)> (80.0%)
Image
Caption

Large dark gray clouds are looming low over the top of a highway full of cars. A concrete median is in the middle of the highway. Trees are on each side of the highway, with streetlight posts beside them. A yellow sign with an arrow on it pointing upwards is on the right side of the highway. Large power lines towers are in on the left side of the highway. A large metal structure with signs on it is over the highway, with cars passing underneath it. The image is taken through a dirty glass window of a car on the highway.

Scene Graph
    Scene Graph
QA + verification programs
Q: What color are the clouds looming over the highway?
A: The clouds looming over the highway are dark gray in color.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'clouds' in entity:                                              
  3             attributes = scene_graph.get_attributes(entity)                 
  4             if 'dark gray' in attributes.get('color', ''):                  
  5                 return 'dark gray'                                          
  6     return None                                                             
  7                                                                             

gpt-4o: The clouds looming over the highway are dark gray, indicating stormy or rainy weather.
hscore: 88.38%
• <state (clouds, looming)> == <state (clouds, looming)> (100.0%)
• <state (clouds, dark gray in color)> == <color (clouds, dark gray)> (80.0%)
• <state (clouds, over)> == <state (clouds, looming)> (80.0%)
tscore: 89.42%
• <color (clouds, dark gray)> == <color (clouds, dark gray)> (100.0%)
• <whole (clouds)> == <whole (clouds)> (100.0%)
• <whole (highway)> == <whole (highway)> (100.0%)

Q: What's the state of the glass window through which the image is taken?
A: The glass window through which the image is taken is in a dirty state.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'glass window' in entity:                                        
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('state', '')                              
  5     return None                                                             
  6                                                                             

gpt-4o: The glass window through which the image is taken appears to be wet, likely due to rain. There are visible water droplets and streaks on the glass, indicating that it is raining or has recently rained.
hscore: 75.08%
• <state (glass window, dirty)> == <state (window, glass)> (70.0%)
tscore: 69.55%
• <state (environment, recently rained)"> == <image> (70.0%)
• <state (window, water droplets on)> == <image> (70.0%)
• <state (environment, raining)> == <image> (70.0%)

Q: Where is the concrete median located?
A: The concrete median is located in the middle of the highway.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'median' in entity:                                              
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             attributes = scene_graph.get_attributes(entity)                 
  5             if 'highway' in relations and 'in the middle of' in             
    relations['highway'].get('spatial', ''):                                    
  6                 return f"The concrete median is in the middle of the        
    highway."                                                                   
  7     return None                                                             
  8                                                                             

gpt-4o: The concrete median is located in the center of the highway, separating the lanes of traffic moving in opposite directions.
hscore: 97.72%
• <whole (median)> == <whole (median)> (100.0%)
• <material (median, concrete)> == <material (median, concrete)> (100.0%)
• <spatial (median, highway, in the middle of)> == <spatial (median, highway, in)> (90.0%)
tscore: 93.08%
• <whole (median)> == <whole (median)> (100.0%)
• <material (median, concrete)> == <material (median, concrete)> (100.0%)
• <spatial (median, highway, in)> == <spatial (median, highway, in the middle of)> (90.0%)

Q: What kinds of structures are on either side of the highway?
A: There are trees on each side of the highway, and power line towers are on the left side.
   0 def execute_command(scene_graph):                                          
   1     left_side_structures = []                                              
   2     both_sides_structures = []                                             
   3     for entity in scene_graph.get_entities():                              
   4         relations = scene_graph.get_outgoing_relations(entity)             
   5         if 'highway' in relations:                                         
   6             spatial_relation = relations['highway'].get('spatial', '')     
   7             if 'on each side of' in spatial_relation:                      
   8                 both_sides_structures.append(entity)                       
   9             elif 'on the left side of' in spatial_relation:                
  10                 left_side_structures.append(entity)                        
  11     return {                                                               
  12         "both_sides": both_sides_structures,                               
  13         "left_side": left_side_structures                                  
  14     }                                                                      
  15                                                                            

gpt-4o: On either side of the highway in the image, there are several structures and features visible: - On the left side, there are trees and vegetation, as well as a power line tower. - On the right side, there are also trees and vegetation. - In the distance, there are more power line towers and what appears to be some industrial or commercial buildings. The overall scene shows a highway with surrounding greenery and some infrastructure elements like power lines and distant buildings.
hscore: 76.73%
• <part (power line towers)> == <part (power line tower)> (90.0%)
• <spatial (power line towers, highway, on the left side)> == <spatial (power line tower, highway's left side, on)> (90.0%)
• <spatial (trees, highway, on each side of)> == <spatial (power line tower, highway's left side, on)> (60.0%)
tscore: 78.52%
• <whole (highway)> == <whole (highway)> (100.0%)
• <spatial (power line tower, highway's left side, on)> == <spatial (power lines towers, highway, on the left side of)> (90.0%)
• <part (distant power line towers)> == <whole (power lines towers)> (80.0%)
Image
Caption

An indoor side view of a bed in the center of a dim lit room, the bed has white mattress covers and white sheets, the bed also has a dark colored frame and headboard. There are two dark colored night stands on both sides of the bed, each with an identical lamp that has a large arm and a cubed shaped shader, the lamp to the left of the bed is on while the right one is off. About four feet in front of the bed is a TV console that has TV placed on top of it along with a rectangular shaped mirror.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is the shape of the lamp shader?
A: The lamp shader has a cubed shape.
  0 def execute_command(scene_graph):                                           
  1     lamp_shader_attributes = scene_graph.get_attributes("lamps' shader")    
  2                                                                             
  3     return lamp_shader_attributes.get('shape', 'unknown')                   
  4                                                                             

gpt-4o: The lamp shades in the image are trapezoidal in shape.
hscore: 67.19%
• <shape (lamp, cubed)> == <shape (lamp shades, trapezoidal)"> (70.0%)
• <whole (lamp)> == <part (lamp shades)> (60.0%)
tscore: 70.15%
• <part (lamp shades)> == <image> (70.0%)
• <shape (lamp shades, trapezoidal)"> == <shape (lamps' shader, cubed)> (60.0%)

Q: How many night stands are in the room, and what do they look like?
A: There are two dark colored night stands on both sides of the bed.
  0 def execute_command(scene_graph):                                           
  1     night_stands_attributes = scene_graph.get_attributes('night stands')    
  2                                                                             
  3     if 'count' in night_stands_attributes and 'color' in                    
    night_stands_attributes:                                                    
  4         return f"There are {night_stands_attributes['count']}               
    {night_stands_attributes.get('color')} night stands on both sides of the    
    bed."                                                                       
  5                                                                             
  6     return None                                                             
  7                                                                             

gpt-4o: There are two nightstands in the room. They are both black with a single drawer and a lower shelf. Each nightstand has a lamp with a rectangular shade on it.
hscore: 69.36%
• <count (night stands, ==2)> == <count (nightstands, ==2)> (70.0%)
• <spatial (night stands, bed, on both sides of)> == <spatial (lamps, nightstands, on)> (70.0%)
• <color (night stands, dark)> == <color (nightstands, black)> (70.0%)
tscore: 76.23%
• <whole (room)> == <whole (room)> (100.0%)
• <spatial (lamps, nightstands, on)> == <spatial (lamps, night stands, on)> (90.0%)
• <shape (lamps' shade, rectangular)"> == <image> (70.0%)
Image
Caption

A close up shot of the back of a statue of a man facing a chain link fence. The statue has orange hair and a cap made of leaves on its head. The statue is wearing a lime green jersey with 2 white strips on its arm sleeves and one white stripe on the upper part of the jersey. "LOST BOY / 01" is written in large white text. The statue has a belt composed of an alternating triangle pattern of blue and light blue. White and orange flowers are visible through the fence. Trees and a white house are to the left of the statue. The sky is bright with large white clouds.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is the color of the hair on the statue facing the fence?
A: The hair on the statue facing the fence is orange.
   0 def execute_command(scene_graph):                                          
   1     for entity in scene_graph.get_entities():                              
   2         if "statue" in entity:                                             
   3             relations = scene_graph.get_outgoing_relations(entity)         
   4             if "fence" in relations and "face" in                          
     relations["fence"].get("spatial", ""):                                     
   5                 statue_hair = scene_graph.get_outgoing_relations(entity +  
     "'s hair")                                                                 
   6                 if "statue" in statue_hair:                                
   7                     return scene_graph.get_attributes(entity + "'s         
     hair").get("color", "")                                                    
   8     return None                                                            
   9                                                                            
  10                                                                            

gpt-4o: The hair on the statue facing the fence is green on top and orange below.
hscore: 92.03%
• <color (statue's hair, orange)> == <color (statue's hair, below, orange)"> (90.0%)
• <state (statue, facing fence)> == <state (statue, facing the fence)> (90.0%)
• <state (statue, facing)> == <state (statue, facing the fence)> (80.0%)
tscore: 90%
• <part (statue's hair)> == <part (statue's hair)> (100.0%)
• <whole (statue)> == <whole (statue)> (100.0%)
• <color (statue's hair, below, orange)"> == <color (statue's hair, orange)> (90.0%)

Q: What is written on the statue's jersey?
A: "LOST BOY / 01" is written on the statue's jersey in large white text.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "word" in entity:                                                
  3             return scene_graph.get_attributes(entity).get("text rendering", 
    "")                                                                         
  4     return None                                                             
  5                                                                             
  6                                                                             

gpt-4o: The jersey on the statue has "LOST BOY" written on it, along with the number "01".
hscore: 51.14%
• <text rendering (statue's jersey, \LOST BOY / 01\)> == <text rendering (jersey, \"LOST BOY\")> (80.0%)
• <state (text, large)> == <text rendering (jersey, \"LOST BOY\")> (40.0%)
• <color (text, white)> == <text rendering (jersey, \"LOST BOY\")> (30.0%)
tscore: 81.25%
• <whole (statue)> == <whole (statue)> (100.0%)
• <text rendering (jersey, \"LOST BOY\")> == <image> (70.0%)
• <part (jersey)> == <image> (70.0%)

Q: What objects are found to the left of the statue?
A: Trees and a white house are to the left of the statue.
   0 def execute_command(scene_graph):                                          
   1     objects_found = []                                                     
   2     for entity in scene_graph.get_entities():                              
   3         if "house" in entity or "trees" in entity:                         
   4             relations = scene_graph.get_outgoing_relations(entity)         
   5             if "statue" in relations and "left of" in                      
     relations["statue"].get("spatial", ""):                                    
   6                 description =                                              
     scene_graph.describe(scene_graph.generate_subgraph([entity]))              
   7                 objects_found.append(description)                          
   8     return " and ".join(objects_found) if objects_found else None          
   9                                                                            
  10                                                                            

gpt-4o: To the left of the statue, there are some trees and a fence. Additionally, there are some orange objects that appear to be part of a structure or installation in the background.
hscore: 80.04%
• <whole (trees)> == <whole (trees)> (100.0%)
• <spatial (statue, trees and white house, to the left of)> == <spatial (trees, statue, to the left of)> (80.0%)
• <whole (white house)> == <whole (fence)> (50.0%)
tscore: 78.25%
• <whole (fence)> == <whole (fence)> (100.0%)
• <whole (statue)> == <whole (statue)> (100.0%)
• <whole (trees)> == <whole (trees)> (100.0%)
Image
Caption

A medium-close-up view of brown and black print sunglasses with black lenses sitting on a light brown wooden table. Behind it is a light gray block laid sideways. A sketch of a girl, drawn four times, each facing a different direction. She has short hair with goggles on her head, and she's wearing a knit sweater with flowers on it and cargo pants with a choker. She has a hook for a right hand. Beside the sketch is a color wheel palette with different shades of blue, green, light blue, yellow, brown, and pink. A light gray wall is behind the items.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is beside the sketch of the girl?
A: Beside the sketch of the girl is a color wheel palette with different shades of colors.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'color wheel palette' in entity:                                 
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'sketch' in relations and 'beside' in                        
    relations['sketch']['spatial']:                                             
  5                 colors = scene_graph.get_attributes(entity).get('color',    
    '').replace(',', ', ')                                                      
  6                 return f"Beside the sketch of the girl is a color wheel     
    palette with different shades of {colors}."                                 
  7     return None                                                             
  8                                                                             

gpt-4o: Beside the sketch of the girl, there is a color wheel. The color wheel is partially covering the sketchbook.
hscore: 82.04%
• <spatial (color wheel palette, girl's sketch, beside)> == <spatial (color wheel, sketch of the girl, beside)> (90.0%)
• <whole (girl's sketch)> == <whole (sketch of the girl)> (80.0%)
• <whole (color wheel palette)> == <whole (color wheel)> (80.0%)
tscore: 81.7%
• <whole (sketchbook)> == <whole (sketch)> (80.0%)
• <spatial (color wheel, sketchbook, partially covering)"> == <spatial (color wheel palette, sketch, beside)> (80.0%)
• <whole (color wheel)> == <whole (color wheel palette)> (80.0%)

Q: Where is the wall in relation to the other items?
A: The wall is behind the items.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'wall' in entity:                                                
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'items' in relations and 'behind' in                         
    relations['items']['spatial']:                                              
  5                 return "The wall is behind the items."                      
  6     return None                                                             
  7                                                                             

gpt-4o: The wall is behind all the other items in the image. The items, including the sketchbook, color wheel, rectangular object, and sunglasses, are placed on a surface in front of the wall.
hscore: 57.07%
• <spatial (items, wall, behind)> == <spatial (wall, other items, behind)> (90.0%)
• <part (wall)> == <whole (wall)> (80.0%)
• <whole (building)> == <whole (wall)> (70.0%)
tscore: 85.13%
• <whole (wall)> == <whole (wall)> (100.0%)
• <whole (sunglasses)"> == <whole (sunglasses)> (90.0%)
• <spatial (wall, other items, behind)> == <spatial (wall, items, behind)> (90.0%)

Q: What colors are the lenses of the sunglasses?
A: The lenses of the sunglasses are black.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "sunglasses' lenses" in entity:                                  
  3             return f"The lenses of the sunglasses are                       
    {scene_graph.get_attributes(entity).get('color', '')}."                     
  4     return None                                                             
  5                                                                             

gpt-4o: The lenses of the sunglasses in the image are dark, likely black or a very dark shade of brown.
hscore: 96.87%
• <part (sunglasses)> == <part (sunglasses)> (100.0%)
• <color (sunglasses' lenses, black)> == <color (sunglasses' lenses, black or dark brown)> (90.0%)
tscore: 85.42%
• <color (sunglasses' lenses, black or dark brown)> == <color (sunglasses' lenses, black)> (90.0%)
• <color (sunglasses' lenses, dark)> == <color (sunglasses' lenses, black)> (90.0%)
• <state (sunglasses, worn)"> == <state (sunglasses, sit)> (80.0%)

Q: What is shown on the girl's head in the sketch?
A: The girl has goggles on her head.
  0 def execute_command(scene_graph):                                           
  1     return "The girl has goggles on her head."                              
  2                                                                             

gpt-4o: In the sketch, the girl is shown wearing goggles on her head.
hscore: 96.95%
• <part (girl's goggles)> == <part (girl's goggles)> (100.0%)
• <state (girl's goggles, on her head)> == <state (girl's goggles, on her head)"> (90.0%)
tscore: 84.87%
• <whole (girl)> == <whole (girl)> (100.0%)
• <state (girl's goggles, on her head)"> == <state (she's goggles, wear)> (80.0%)
• <part (girl's goggles)> == <part (she's head)> (60.0%)
Image
Caption

A view of black seats in a movie theater. They are in rows and have numbers at each table on them. The wall is red, and there is a light on the left wall that is shaped like a film reel. There is a projector window on the back wall over the seats. Black speakers are attached to the back wall. There is a blue light shining out on the wall, casting a film reel shaped shadow. Steps are visible on the right of the rows of chairs.

Scene Graph
    Scene Graph
QA + verification programs
Q: Where is the projector window in relation to the seats?
A: The projector window is over the seats.
  0 def execute_query(scene_graph):                                             
  1     for entity in scene_graph.get_entities():                               
  2         if 'projector window' in entity:                                    
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'seats' in relations_out and 'over' in                       
    relations_out['seats']['spatial']:                                          
  5                 return 'over the seats'                                     
  6     return None                                                             
  7                                                                             

gpt-4o: In the image, the projector window is located on the back wall of the theater, above the seats. It is positioned centrally and elevated, ensuring that the projection can cover the entire screen at the front of the theater.
hscore: 97.47%
• <spatial (projector window, seats, over)> == <spatial (projector window, seats, above)> (90.0%)
tscore: 86.07%
• <spatial (projector window, back wall, on)> == <spatial (projector window, back wall, on)> (100.0%)
• <spatial (projector window, seats, above)> == <spatial (projector window, seats, over)> (90.0%)
• <spatial (projector window, back wall, elevated)> == <spatial (projector window, back wall, on)> (90.0%)

Q: What is on the right of the rows of chairs?
A: Steps are visible on the right of the rows of chairs.
  0 def execute_query(scene_graph):                                             
  1     for entity in scene_graph.get_entities():                               
  2         if 'steps' in entity:                                               
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'rows of chairs' in relations_out and 'on the right of' in   
    relations_out['rows of chairs']['spatial']:                                 
  5                 return 'steps'                                              
  6     return None                                                             
  7                                                                             

gpt-4o: To the right of the rows of chairs, there is a staircase that provides access to the different rows of seats in the theater.
hscore: 79.64%
• <spatial (steps, rows of chairs, right of)> == <spatial (staircase, rows of chairs, to the right of)> (90.0%)
• <whole (steps)> == <whole (staircase)> (60.0%)
tscore: 86.59%
• <whole (rows of chairs)> == <whole (rows of chairs)> (100.0%)
• <spatial (staircase, rows of chairs, to the right of)> == <spatial (steps, rows of chairs, on the right of)> (90.0%)
• <whole (staircase)> == <image> (70.0%)

Q: Are the seats in the movie theater colored? If so, what color are they?
A: Yes, the seats in the movie theater are black.
  0 def execute_query(scene_graph):                                             
  1     for entity in scene_graph.get_entities():                               
  2         if 'seats' in entity:                                               
  3             color = scene_graph.get_attributes(entity).get('color', '')     
  4             if color:                                                       
  5                 return color                                                
  6             else:                                                           
  7                 return "No color is specified."                             
  8     return None                                                             
  9                                                                             

gpt-4o: Yes, the seats in the movie theater are colored. They are black.
hscore: 97.05%
• <color (seats, black)> == <color (seats, black)"> (90.0%)
tscore: 84.54%
• <color (seats, black)"> == <color (seats, black)> (90.0%)
• <part (seats)> == <whole (seats)> (80.0%)
• <state (seats, colored)> == <image> (70.0%)