Image
Caption

An overhead aerial view of a landscape covered in green with hills throughout the image seen from the vantage point of an airplane. There is a group of cumulus clouds on the left side of the image casting a shadow over the land on the right side of the image. Extending vertically from the top of the image to the bottom of the image is a road with slight bends in it as it moves along in a general direction.

Scene Graph
    Scene Graph
QA + verification programs
Q: Where are the cumulus clouds located in the image?
A: The cumulus clouds are on the left side of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'cumulus clouds' in entity:                                      
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             for target_entity, relation_info in relations_out.items():      
  5                 if 'left side of the image' in target_entity:               
  6                     return target_entity                                    
  7     return None                                                             
  8                                                                             

gpt-4o: The cumulus clouds in the image are located in the upper left quadrant. They appear as fluffy, white clouds casting shadows on the ground below.
hscore: 92.17%
• <whole (cumulus clouds)> == <whole (cumulus clouds)> (100.0%)
• <state (cumulus clouds, on the left side)> == <state (cumulus clouds, fluffy)> (80.0%)
tscore: 83.44%
• <whole (cumulus clouds)> == <whole (cumulus clouds)> (100.0%)
• <spatial (cumulus clouds, upper left quadrant, in)> == <spatial (cumulus clouds, left side of the image, on)> (80.0%)
• <spatial (cumulus clouds' shadows, ground, on)"> == <(cumulus clouds, shadow, cast)> (80.0%)

Q: Do the clouds have any specific grouping in the image?
A: Yes, the clouds are grouped as cumulus clouds.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'cumulus clouds' in entity:                                      
  3             return scene_graph.get_attributes(entity).get('state', '')      
  4     return None                                                             
  5                                                                             

gpt-4o: In the image, the clouds appear to be grouped in clusters. There are several distinct clusters of clouds, with some larger and more dense than others. The clouds are scattered across the landscape, with noticeable gaps between the clusters. The arrangement suggests a somewhat random distribution, typical of cumulus clouds, which often form in groups due to localized rising air currents.
hscore: 86.85%
• <type (clouds, cumulus)> == <type (clouds, cumulus)"> (90.0%)
• <spatial (clouds, grouped)> == <state (clouds, grouped)> (70.0%)
tscore: 76.36%
• <whole (clouds)> == <whole (cumulus clouds)> (80.0%)
• <state (clouds, grouped)> == <state (cumulus clouds, group)> (80.0%)
• <state (cloud clusters, distinct)> == <image> (70.0%)
Image
Caption

A view of 3 different signs on a street corner of a T-shaped intersection. 2 signs are attached to a metal pole planted in an overgrown patch of grass with a silver fire extinguisher . At the top of the metal pole, there is black rectangular sign with a left facing white arrow. The black text within the arrow reads "ONE WAY". In the middle of the middle pole, there is a red,hexagon shaped sign with white upside down text “STOP”. There is a graffiti tag “PVT A” written next to “STOP” To the right of the metal pole, there is a power pole with a square blue sign with text “COHN”. Behind the wooden pole, there is a side view of houses with cars parked in front.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is written next to the "STOP" sign?
A: The graffiti tag "PVT A" is written next to the "STOP" sign.
  0 def get_graffiti_next_to_stop(scene_graph):                                 
  1     for entity in scene_graph.get_entities():                               
  2         if 'STOP sign' in entity:                                           
  3             incoming_relations = scene_graph.get_incoming_relations(entity) 
  4             for source_entity, relations in incoming_relations.items():     
  5                 if 'graffiti tag' in source_entity and 'next to' in         
    relations.get('spatial', ''):                                               
  6                     return                                                  
    scene_graph.get_attributes(source_entity).get('text rendering', "")         
  7     return None                                                             
  8                                                                             

gpt-4o: Next to the "STOP" sign, there is some graffiti that appears to be the letters "PVA" written in white.
hscore: 81.46%
• <whole (graffiti tag)> == <whole (graffiti)> (90.0%)
• <spatial (graffiti tag, \STOP\ sign, next to)> == <spatial (graffiti, sign, next to)> (80.0%)
• <text (graffiti tag's text, \PVT A\)> == <text rendering (graffiti, \"PVA\")> (80.0%)
tscore: 87.5%
• <whole (sign)> == <whole (sign)> (100.0%)
• <spatial (graffiti, sign, next to)> == <spatial (graffiti tag, STOP sign, next to)> (90.0%)
• <whole (graffiti)> == <whole (graffiti tag)> (90.0%)

Q: Where is the "COHN" text located?
A: The "COHN" text is located on a square blue sign to the right of the metal pole.
   0 def find_location_of_cohn(scene_graph):                                    
   1     for entity in scene_graph.get_entities():                              
   2         if 'sign' in entity:                                               
   3             attrs = scene_graph.get_attributes(entity)                     
   4             if 'COHN' in attrs.get('text rendering', ''):                  
   5                 outgoing_relations =                                       
     scene_graph.get_outgoing_relations(entity)                                 
   6                 for target_entity, relation in outgoing_relations.items(): 
   7                     if 'pole' in target_entity:                            
   8                         return "sign code located on a square blue sign to 
     the right of the metal pole"                                               
   9     return None                                                            
  10                                                                            

gpt-4o: The "Cohn" text is located on a blue street sign attached to a wooden utility pole on the right side of the image.
hscore: 70.91%
• <color (sign, blue)> == <color (street sign, blue)> (80.0%)
• <text rendering (sign, \COHN\)> == <text rendering (street sign, \"Cohn\")> (80.0%)
• <spatial (sign, metal pole, to the right of)> == <spatial (street sign, utility pole, attached to)> (70.0%)
tscore: 84.37%
• <material (utility pole, wooden)> == <material (pole, wood)> (90.0%)
• <whole (utility pole)> == <whole (metal pole)> (90.0%)
• <whole (street sign)> == <whole (STOP sign)> (80.0%)

Q: What shape is the intersection in the scene?
A: The intersection in the scene is T-shaped.
  0 def get_intersection_shape(scene_graph):                                    
  1     for entity in scene_graph.get_entities():                               
  2         if 'intersection' in entity:                                        
  3             return scene_graph.get_attributes(entity).get('shape', "")      
  4     return None                                                             
  5                                                                             

gpt-4o: The intersection in the scene appears to be a T-shaped intersection. The street in the foreground ends at a perpendicular street, forming the shape of a "T".
hscore: 57.29%
• <spatial (intersection, scene, in)> == <spatial (streets, intersection, at)> (70.0%)
• <shape (intersection, T> == <spatial (streets, intersection, at)> (40.0%)
tscore: 77.78%
• <spatial (streets, intersection, at)> == <spatial (street corner, intersection, of)> (80.0%)
• <state (streets, perpendicular)> == <image> (70.0%)
• <state (streets, intersecting)"> == <image> (70.0%)

Q: What object is planted in the grass?
A: A silver fire extinguisher is planted in the grass.
  0 def find_object_in_grass(scene_graph):                                      
  1     for entity in scene_graph.get_entities():                               
  2         if 'fire extinguisher' in entity:                                   
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if 'grass' in relations and 'in' in                             
    relations['grass']['spatial']:                                              
  5                 return scene_graph.get_attributes(entity).get('color', "") +
    " fire extinguisher"                                                        
  6     return None                                                             
  7                                                                             

gpt-4o: The object planted in the grass is a fire hydrant.
hscore: 68.41%
• <spatial (fire extinguisher, grass, in)> == <spatial (fire hydrant, grass, in)"> (80.0%)
• <whole (fire extinguisher)> == <whole (fire hydrant)> (70.0%)
• <spatial (fire extinguisher, planted)> == <spatial (fire hydrant, grass, in)"> (70.0%)
tscore: 78.14%
• <spatial (fire hydrant, grass, in)"> == <spatial (fire extinguisher, grass, in)> (80.0%)
• <whole (fire hydrant)> == <whole (fire extinguisher)> (70.0%)
• <part (fire hydrant, planted in grass)> == <image> (70.0%)
Image
Caption

An outdoor angled down view of several black trash bags placed side by side and stacked on top of each other, the trash bag towards the bottom right is placed on top of a wooden beam and several cylindrical cans, there is a pair of gray tennis shoes placed on top of that same bag. The rest of the trash bags are placed on a muddy dirt floor, to the left of the bags is a patch of tall green grass and roots, to the right of trash bags are two white pillars staked into a tiled platform.

Scene Graph
    Scene Graph
QA + verification programs
Q: Where are the gray tennis shoes located in the scene?
A: The gray tennis shoes are located on top of the trash bag towards the bottom right, which is on top of a wooden beam and several cylindrical cans.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'pair of gray tennis shoes' in entity:                           
  3             in_relations = scene_graph.get_incoming_relations(entity)       
  4             for source, relation in in_relations.items():                   
  5                 if 'on top of' in relation.get('spatial', ''):              
  6                     return                                                  
    scene_graph.describe(scene_graph.generate_subgraph([source]))               
  7     return None                                                             
  8                                                                             

gpt-4o: The gray tennis shoes are located on top of a black garbage bag in the scene.
hscore: 82.87%
• <whole (tennis shoes)> == <whole (tennis shoes)> (100.0%)
• <color (tennis shoes, gray)> == <color (tennis shoes, gray)> (100.0%)
• <spatial (tennis shoes, trash bag, on top of)> == <spatial (tennis shoes, garbage bag, on top of)"> (90.0%)
tscore: 74.62%
• <whole (garbage bag)> == <whole (trash bags)> (90.0%)
• <color (garbage bag, black)> == <color (trash bags, black)> (90.0%)
• <spatial (tennis shoes, garbage bag, on top of)"> == <spatial (trash bag towards the bottom right, pair of gray tennis shoes, on top of)> (70.0%)

Q: How are the trash bags arranged?
A: The trash bags are placed side by side and stacked on top of each other in an outdoor angled down view.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'trash bags' in entity:                                          
  3             return scene_graph.get_attributes(entity).get('state', '')      
  4     return None                                                             
  5                                                                             

gpt-4o: The trash bags in the image are arranged in a cluster on the ground. There are several black trash bags, some of which are stacked on top of each other. One of the bags has a pair of shoes placed on top of it. Additionally, there are some cardboard boxes and other miscellaneous items around the bags. The area appears to be outdoors, possibly in a yard or construction site.
hscore: 87.92%
• <count (trash bags, >=2)> == <count (trash bags, >1)> (90.0%)
• <state (trash bags, stacked on top of each other)> == <state (trash bags, stacked)> (90.0%)
• <spatial (trash bags, outdoor, in)> == <spatial (cardboard boxes, trash bags, around)> (90.0%)
tscore: 84.79%
• <color (trash bags, black)> == <color (trash bags, black)> (100.0%)
• <whole (trash bags)> == <whole (trash bags)> (100.0%)
• <spatial (trash bags, ground, on)> == <spatial (trash bags, dirt floor, on)> (90.0%)
Image
Caption

An outdoor view of a sculpture of an ant colored purple and pink, the ant is wearing blue high heels and is holding a long silver microphone to its mouth. The ant is sitting on a sculpture of a yellow hard shell taco with veggies inside of it. The taco is placed over a path of green turf, the taco is sitting on two parallel red colored rectangular shaped slabs with each side of the taco on one of the slabs. To the right of the ant sculpture is a line of tall green bushes planted side by side. Behind the ant sculpture is a tall black fence, behind the fence are many tall green trees.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is the ant sculpture seated on?
A: The ant sculpture is seated on a sculpture of a yellow hard shell taco.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'ant' in entity:                                                 
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'taco' in relations_out and 'on' in                          
    relations_out['taco']['spatial']:                                           
  5                 taco_attrs = scene_graph.get_attributes('taco\'s shell')    
  6                 return f"a sculpture of a {taco_attrs['color']}             
    {taco_attrs['state']} shell taco"                                           
  7     return None                                                             
  8                                                                             

gpt-4o: The ant sculpture is seated on a large, colorful taco.
hscore: 69.86%
• <whole (taco sculpture)> == <whole (sculpture)> (80.0%)
• <spatial (ant sculpture, taco sculpture, seated on)> == <type (sculpture, ant)> (70.0%)
• <state (taco sculpture, hard shell)> == <state (taco, large)> (70.0%)
tscore: 85.21%
• <whole (sculpture)> == <whole (sculpture)> (100.0%)
• <whole (taco)> == <whole (taco)> (100.0%)
• <state (taco, colorful)"> == <state (taco's veggies, inside of taco)> (80.0%)

Q: What is to the right of the ant sculpture?
A: To the right of the ant sculpture is a line of tall green bushes.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'ant sculpture' in entity:                                       
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'bushes' in relations_out and 'to the right of' in           
    relations_out['bushes']['spatial']:                                         
  5                 bushes_attrs = scene_graph.get_attributes('bushes')         
  6                 return f"a line of {bushes_attrs['shape']}                  
    {bushes_attrs['color']} bushes"                                             
  7     return None                                                             
  8                                                                             

gpt-4o: To the right of the ant sculpture, there is a green hedge and a dark green fence.
hscore: 65.32%
• <spatial (bushes, ant sculpture, to the right of)> == <spatial (fence, ant sculpture, to the right of)"> (80.0%)
• <color (bushes, green)> == <color (hedge, green)> (80.0%)
• <whole (bushes)> == <whole (hedge)> (70.0%)
tscore: 87.4%
• <whole (fence)> == <whole (fence)> (100.0%)
• <spatial (fence, ant sculpture, to the right of)"> == <spatial (ant sculpture, fence, in front of)> (90.0%)
• <color (hedge, green)> == <color (bushes, green)> (80.0%)

Q: Where is the yellow hard shell taco placed?
A: The yellow hard shell taco is placed over a path of green turf.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'taco' in entity:                                                
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'path' in relations_out and 'over' in                        
    relations_out['path']['spatial']:                                           
  5                 path_attrs = scene_graph.get_attributes('turf')             
  6                 return f"a path of {path_attrs['color']} turf"              
  7     return None                                                             
  8                                                                             

gpt-4o: The yellow hard shell taco is placed as part of a sculpture in a mini-golf course. The sculpture features a large, colorful ant-like creature sitting on top of the taco, which is positioned over a green putting surface.
hscore: 69.95%
• <color (taco, yellow)> == <color (taco, yellow)> (100.0%)
• <state (taco, hard shell)> == <state (taco, placed)> (80.0%)
• <whole (turf)> == <whole (taco)> (50.0%)
tscore: 84.75%
• <whole (sculpture)> == <whole (sculpture)> (100.0%)
• <whole (taco)> == <whole (taco)> (100.0%)
• <whole (ant> == <whole (ant)> (90.0%)
Image
Caption

An indoor side view of a bed in the center of a dim lit room, the bed has white mattress covers and white sheets, the bed also has a dark colored frame and headboard. There are two dark colored night stands on both sides of the bed, each with an identical lamp that has a large arm and a cubed shaped shader, the lamp to the left of the bed is on while the right one is off. About four feet in front of the bed is a TV console that has TV placed on top of it along with a rectangular shaped mirror.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is the shape of the lamp shader?
A: The lamp shader has a cubed shape.
  0 def execute_command(scene_graph):                                           
  1     lamp_shader_attributes = scene_graph.get_attributes("lamps' shader")    
  2                                                                             
  3     return lamp_shader_attributes.get('shape', 'unknown')                   
  4                                                                             

gpt-4o: The lamp shades in the image are trapezoidal in shape.
hscore: 67.19%
• <shape (lamp, cubed)> == <shape (lamp shades, trapezoidal)"> (70.0%)
• <whole (lamp)> == <part (lamp shades)> (60.0%)
tscore: 70.15%
• <part (lamp shades)> == <image> (70.0%)
• <shape (lamp shades, trapezoidal)"> == <shape (lamps' shader, cubed)> (60.0%)

Q: How many night stands are in the room, and what do they look like?
A: There are two dark colored night stands on both sides of the bed.
  0 def execute_command(scene_graph):                                           
  1     night_stands_attributes = scene_graph.get_attributes('night stands')    
  2                                                                             
  3     if 'count' in night_stands_attributes and 'color' in                    
    night_stands_attributes:                                                    
  4         return f"There are {night_stands_attributes['count']}               
    {night_stands_attributes.get('color')} night stands on both sides of the    
    bed."                                                                       
  5                                                                             
  6     return None                                                             
  7                                                                             

gpt-4o: There are two nightstands in the room. They are both black with a single drawer and a lower shelf. Each nightstand has a lamp with a rectangular shade on it.
hscore: 69.36%
• <count (night stands, ==2)> == <count (nightstands, ==2)> (70.0%)
• <spatial (night stands, bed, on both sides of)> == <spatial (lamps, nightstands, on)> (70.0%)
• <color (night stands, dark)> == <color (nightstands, black)> (70.0%)
tscore: 76.23%
• <whole (room)> == <whole (room)> (100.0%)
• <spatial (lamps, nightstands, on)> == <spatial (lamps, night stands, on)> (90.0%)
• <shape (lamps' shade, rectangular)"> == <image> (70.0%)
Image
Caption

Large dark gray clouds are looming low over the top of a highway full of cars. A concrete median is in the middle of the highway. Trees are on each side of the highway, with streetlight posts beside them. A yellow sign with an arrow on it pointing upwards is on the right side of the highway. Large power lines towers are in on the left side of the highway. A large metal structure with signs on it is over the highway, with cars passing underneath it. The image is taken through a dirty glass window of a car on the highway.

Scene Graph
    Scene Graph
QA + verification programs
Q: What color are the clouds looming over the highway?
A: The clouds looming over the highway are dark gray in color.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'clouds' in entity:                                              
  3             attributes = scene_graph.get_attributes(entity)                 
  4             if 'dark gray' in attributes.get('color', ''):                  
  5                 return 'dark gray'                                          
  6     return None                                                             
  7                                                                             

gpt-4o: The clouds looming over the highway are dark gray, indicating stormy or rainy weather.
hscore: 88.38%
• <state (clouds, looming)> == <state (clouds, looming)> (100.0%)
• <state (clouds, dark gray in color)> == <color (clouds, dark gray)> (80.0%)
• <state (clouds, over)> == <state (clouds, looming)> (80.0%)
tscore: 89.42%
• <color (clouds, dark gray)> == <color (clouds, dark gray)> (100.0%)
• <whole (clouds)> == <whole (clouds)> (100.0%)
• <whole (highway)> == <whole (highway)> (100.0%)

Q: What's the state of the glass window through which the image is taken?
A: The glass window through which the image is taken is in a dirty state.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'glass window' in entity:                                        
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('state', '')                              
  5     return None                                                             
  6                                                                             

gpt-4o: The glass window through which the image is taken appears to be wet, likely due to rain. There are visible water droplets and streaks on the glass, indicating that it is raining or has recently rained.
hscore: 75.08%
• <state (glass window, dirty)> == <state (window, glass)> (70.0%)
tscore: 69.55%
• <state (environment, recently rained)"> == <image> (70.0%)
• <state (window, water droplets on)> == <image> (70.0%)
• <state (environment, raining)> == <image> (70.0%)

Q: Where is the concrete median located?
A: The concrete median is located in the middle of the highway.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'median' in entity:                                              
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             attributes = scene_graph.get_attributes(entity)                 
  5             if 'highway' in relations and 'in the middle of' in             
    relations['highway'].get('spatial', ''):                                    
  6                 return f"The concrete median is in the middle of the        
    highway."                                                                   
  7     return None                                                             
  8                                                                             

gpt-4o: The concrete median is located in the center of the highway, separating the lanes of traffic moving in opposite directions.
hscore: 97.72%
• <whole (median)> == <whole (median)> (100.0%)
• <material (median, concrete)> == <material (median, concrete)> (100.0%)
• <spatial (median, highway, in the middle of)> == <spatial (median, highway, in)> (90.0%)
tscore: 93.08%
• <whole (median)> == <whole (median)> (100.0%)
• <material (median, concrete)> == <material (median, concrete)> (100.0%)
• <spatial (median, highway, in)> == <spatial (median, highway, in the middle of)> (90.0%)

Q: What kinds of structures are on either side of the highway?
A: There are trees on each side of the highway, and power line towers are on the left side.
   0 def execute_command(scene_graph):                                          
   1     left_side_structures = []                                              
   2     both_sides_structures = []                                             
   3     for entity in scene_graph.get_entities():                              
   4         relations = scene_graph.get_outgoing_relations(entity)             
   5         if 'highway' in relations:                                         
   6             spatial_relation = relations['highway'].get('spatial', '')     
   7             if 'on each side of' in spatial_relation:                      
   8                 both_sides_structures.append(entity)                       
   9             elif 'on the left side of' in spatial_relation:                
  10                 left_side_structures.append(entity)                        
  11     return {                                                               
  12         "both_sides": both_sides_structures,                               
  13         "left_side": left_side_structures                                  
  14     }                                                                      
  15                                                                            

gpt-4o: On either side of the highway in the image, there are several structures and features visible: - On the left side, there are trees and vegetation, as well as a power line tower. - On the right side, there are also trees and vegetation. - In the distance, there are more power line towers and what appears to be some industrial or commercial buildings. The overall scene shows a highway with surrounding greenery and some infrastructure elements like power lines and distant buildings.
hscore: 76.73%
• <part (power line towers)> == <part (power line tower)> (90.0%)
• <spatial (power line towers, highway, on the left side)> == <spatial (power line tower, highway's left side, on)> (90.0%)
• <spatial (trees, highway, on each side of)> == <spatial (power line tower, highway's left side, on)> (60.0%)
tscore: 78.52%
• <whole (highway)> == <whole (highway)> (100.0%)
• <spatial (power line tower, highway's left side, on)> == <spatial (power lines towers, highway, on the left side of)> (90.0%)
• <part (distant power line towers)> == <whole (power lines towers)> (80.0%)
Image
Caption

An angled top view of a cement square that is sitting on green turf that is painted like dice. The dots are lime green, and the background is light purple. The paint is worn, and there is gray cement showing through in spots. The top shows six worn dots, with parts of the cement weathered away. The front face shows five dots. The light purple paint on top has faded and now looks blue. The sun is shining on top of it. There is a shadow on the ground to the left of it.

Scene Graph
    Scene Graph
QA + verification programs
Q: What texture is the ground where the cement square is placed?
A: The ground on which the cement square is placed has a green turf texture.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'green turf' in entity:                                          
  3             return scene_graph.get_attributes(entity).get('texture', "")    
  4     return None                                                             

gpt-4o: The ground where the cement square is placed appears to have a textured, artificial grass or turf surface. The texture is uniform and resembles the look of synthetic grass commonly used in sports fields, playgrounds, or decorative areas.
hscore: 86.31%
• <state (ground, has texture)> == <state (ground, textured)> (90.0%)
• <spatial (cement square, ground, on)> == <spatial (cement square, ground, placed on)> (90.0%)
• <type (texture, green turf)> == <state (ground, resembles synthetic grass)> (60.0%)
tscore: 73.96%
• <whole (ground)> == <whole (ground)> (100.0%)
• <spatial (cement square, ground, placed on)> == <spatial (cement square, green turf, on)> (80.0%)
• <state (ground, resembles synthetic grass)> == <image> (70.0%)
Image
Caption

An ancient Greek-style statue is seen from below. The statue is made of a pale gray stone. The statue shows a woman on the left with an unidentifiable group of objects on her lap while holding a torch in her right hand. She partially stands and kneels with a man crouching to the right. The woman is pulling a cloth over the man's shoulders with her left hand as he reaches for something below him. The woman wears a toga style dress and has a large, unidentifiable object behind her in the shape of flowing cloth on the right. The man is undressed. Another unidentifiable object is seen on the bottom left, as if she is stepping on it. There is a window in the background on the left with eccentric framing that shows a man's face above it. Another building with windows is seen reflected in the glass of this window. Two large concrete pillars are seen directly behind the statue, above the borders of an unseen object. There is an overcast shade of sunlight in the scene.

Scene Graph
    Scene Graph
QA + verification programs
Q: Describe the object that is reflected in the window.
A: Another building with windows is seen reflected in the glass of this window.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'building' in entity:                                            
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'window' in relations_out and 'reflected in' in              
    relations_out['window']['spatial']:                                         
  5                 return                                                      
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  6     return None                                                             
  7                                                                             

gpt-4o: The object reflected in the window appears to be a building with multiple windows. The reflection shows a portion of the building's facade, which includes architectural details such as window frames and possibly some decorative elements. The building seems to be of a similar architectural style to the one in the foreground, suggesting it might be part of the same complex or located nearby.
hscore: 67.53%
• <spatial (building, glass, reflected in)> == <spatial (reflected building, window, in)> (90.0%)
• <state (building, with windows)> == <state (reflected building, multiple windows)> (70.0%)
• <whole (building)> == <whole (building in foreground)> (70.0%)
tscore: 78.68%
• <spatial (reflected building, window, in)> == <spatial (building, window, reflected in)> (90.0%)
• <state (reflected building, reflected)> == <spatial (building, window, reflected in)> (70.0%)
• <whole (reflected building)> == <spatial (building, window, reflected in)> (70.0%)
Image
Caption

An outdoor close up view of a black cricket on the ground facing toward the right side of the view with its hind legs slightly angled toward the bottom left of the view. The view is filled with green colored grass, small amounts of soil, and numerous amounts of dried grass. Shadows are visible in the top left of the view under green colored grass, light also shines off of the abdomen and head of the cricket. To the right of the cricket is a small patch of soil beneath the nearby grass.

Scene Graph
    Scene Graph
QA + verification programs
Q: Describe the scene feature that provides brightness to the cricket's body?
A: Light shines off the cricket's abdomen and head, providing brightness.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'light' in entity:                                               
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if "cricket's head" in relations and "cricket's abdomen" in     
    relations:                                                                  
  5                 description = "Light shines off the cricket's abdomen and   
    head."                                                                      
  6                 return description                                          
  7     return None                                                             
  8                                                                             

gpt-4o: The scene features that provide brightness to the cricket's body in the image include the natural light from the environment and the reflective surfaces of the green grass around it. The light appears to be diffused, likely from an overcast sky or shaded area, which helps illuminate the cricket's body and the surrounding grass, making the cricket more visible. The contrast between the dark body of the cricket and the lighter, green grass also helps in highlighting the cricket.
hscore: 77.76%
• <state (cricket, shining)> == <state (cricket's body, dark)> (80.0%)
• <spatial (light, cricket's abdomen and head, off)> == <spatial (cricket, grass, on)> (70.0%)
• <state (light, providing brightness)> == <state (light, diffused)> (70.0%)
tscore: 77.6%
• <whole (cricket)> == <whole (cricket)> (100.0%)
• <spatial (cricket, grass, on)> == <spatial (cricket, ground, on)> (90.0%)
• <spatial (cricket, environment, in)> == <spatial (cricket, ground, on)> (80.0%)
Image
Caption

An outdoor medium-close-up view of a white rhododendron plant that is sprouting from a plant that is made up of large green leaves that grow in all directions and slightly down as well. Along the leaves, there are small white specks that are spread throughout. Also surrounding the white flower are branches that run horizontally and vertically. Underneath the plant, there are dried brown leaves all throughout the floor. It is daytime, as the plant can be seen clearly.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is sprouting from the plant with large leaves?
A: A white rhododendron plant is sprouting from the plant with large leaves.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         relations = scene_graph.get_outgoing_relations(entity)              
  4         if 'rhododendron plant' in entity and 'sprout' in                   
    attributes.get('state', ''):                                                
  5             for target_entity in relations:                                 
  6                 if 'plant' in target_entity:                                
  7                     return scene_graph.get_attributes(entity).get('color',  
    ''), entity                                                                 
  8     return None                                                             
  9                                                                             

gpt-4o: The plant with large leaves in the image has clusters of light pink or white flowers sprouting from it. These flowers appear to be blooming from the center of the plant.
hscore: 62.33%
• <part (rhododendron plant's leaves)> == <part (plant's flowers)> (70.0%)
• <whole (rhododendron plant)> == <whole (plant)> (60.0%)
• <spatial (rhododendron plant, sprouting from the plant with large leaves)> == <spatial (flowers, plant, from)"> (60.0%)
tscore: 80.06%
• <whole (plant)> == <whole (plant)> (100.0%)
• <color (flowers, light pink or white)> == <color (flower, white)> (80.0%)
• <type (plant, with large leaves)> == <image> (70.0%)

Q: How do the branches run around the white flower?
A: The branches run horizontally and vertically around the white flower.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         relations = scene_graph.get_outgoing_relations(entity)              
  4         if 'branches' in entity and 'flower' in relations:                  
  5             return attributes.get('state', ''), entity                      
  6     return None                                                             
  7                                                                             

gpt-4o: In the image, the branches around the white flower appear to be quite intertwined and somewhat chaotic. Here's a detailed description: 1. **Main Branch**: There is a main branch that runs horizontally behind the white flower cluster. This branch is relatively thick and supports several smaller branches and leaves. 2. **Secondary Branches**: Several secondary branches extend from the main branch. These branches are thinner and more flexible, and they spread out in various directions around the flower. 3. **Leaf Arrangement**: The leaves are attached to the secondary branches and are arranged in a way that they surround the flower cluster. The leaves are large, elongated, and have a leathery texture. 4. **Branch Interaction**: Some branches cross over each other, creating a network of twigs and stems that encircle the flower. This gives the appearance of the flower being nestled within a protective framework of branches and leaves. 5. **Background Branches**: In the background, additional branches and foliage can be seen, adding to the dense and intertwined nature of the plant structure. Overall, the branches create a natural, somewhat messy arrangement around the white flower, providing a supportive and protective environment for the bloom.
hscore: 81.26%
• <color (flower, white)> == <color (flower, white)> (100.0%)
• <spatial (branches, flower, around)> == <spatial (secondary branches, flower, around)> (90.0%)
• <part (flower's branches)> == <part (leaves)> (70.0%)
tscore: 81.74%
• <whole (flower)> == <whole (flower)> (100.0%)
• <color (flower, white)> == <color (flower, white)> (100.0%)
• <spatial (secondary branches, flower, around)> == <spatial (branches, flower, surround)> (90.0%)

Q: Can you describe the small white specks?
A: The small white specks are spread throughout and along the leaves.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         relations = scene_graph.get_outgoing_relations(entity)              
  4         if 'specks' in entity:                                              
  5             return                                                          
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  6     return None                                                             
  7                                                                             

gpt-4o: The small white specks on the leaves in the image appear to be some form of residue or possibly a fungal growth. They are scattered across the surface of the leaves, giving them a speckled appearance. This could be due to various factors such as environmental conditions, pests, or diseases affecting the plant. If the specks are powdery and can be wiped off, it might be powdery mildew, a common fungal disease. If they are more embedded into the leaf surface, it could be a different type of fungal infection or residue from pests.
hscore: 75.07%
• <whole (leaves)> == <part (leaves)> (80.0%)
• <state (leaves, small white specks are on)> == <state (leaves, with specks)> (70.0%)
• <state (small white specks, spread throughout and along the leaves)> == <state (leaves, with specks)> (60.0%)
tscore: 78.75%
• <color (specks, white)> == <color (specks, white)> (100.0%)
• <state (specks, scattered)> == <state (specks, spread throughout)> (80.0%)
• <state (specks, small)> == <state (specks, spread throughout)> (80.0%)