Image
Caption

An eye-level view of a tree growing out of a shallow stream of water in the middle of the image. There are rocks visible below the surface of the transparent water. There are trees on both sides of the stream, casting shadows over most of the water. The reflection of the tree growing in the middle of the stream is shining off of the water below it. In the bottom right corner of the image, a gray rock surface is visible. The sky at the top of the image is clear blue with no cloud coverage.

Scene Graph
    Scene Graph
QA + verification programs
Q: How would you describe the condition of the sky in the image?
A: The sky at the top of the image is clear blue with no cloud coverage.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'sky' in entity:                                                 
  3             sky_attributes = scene_graph.get_attributes(entity)             
  4             return f"The sky is {sky_attributes.get('color', '')} with      
    {sky_attributes.get('state', '')} coverage."                                
  5     return None                                                             
  6                                                                             

gpt-4o: The sky in the image appears to be clear and bright blue, indicating good weather conditions with no visible clouds.
hscore: 82.98%
• <state (sky, clear)> == <state (sky, clear)> (100.0%)
• <color (sky, blue)> == <color (sky, bright blue)> (90.0%)
• <coverage (sky, clouds, none)> == <state (sky, clear)> (50.0%)
tscore: 77.37%
• <color (sky, bright blue)> == <color (sky, clear blue)> (90.0%)
• <state (sky, clear)> == <state (sky, no cloud coverage)> (80.0%)
• <condition (weather, good)> == <image> (70.0%)

Q: What can you tell me about the tree growing in the stream?
A: The tree is growing out of and reflecting in the shallow stream in the middle of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'tree' in entity and 'grow' in                                   
    scene_graph.get_attributes(entity).get('state', ''):                        
  3             return                                                          
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  4     return None                                                             
  5                                                                             

gpt-4o: The tree growing in the stream appears to be a sycamore tree (Platanus occidentalis), which is known for its ability to thrive in wet environments, including along streams and rivers. Here are some key characteristics and information about sycamore trees: 1. **Habitat**: Sycamore trees are commonly found in riparian zones, which are the interfaces between land and a river or stream. They are well-adapted to wet soils and can often be seen growing directly in or near water bodies. 2. **Leaves**: The leaves of the sycamore tree are large, broad, and typically have a palmate shape with 3-5 lobes. They are green during the growing season and turn brown in the fall. 3. **Bark**: One of the distinctive features of sycamore trees is their bark, which peels off in patches to reveal a smooth, whitish or light-colored underbark. This gives the trunk a mottled appearance. 4. **Size**: Sycamore trees can grow to be quite large, often reaching heights of 70-100 feet or more, with a wide canopy spread. 5. **Ecological Importance**: These trees provide important habitat and food for various wildlife species. Their large size and extensive root systems help stabilize stream banks and reduce erosion. 6. **Growth**: Sycamores are fast-growing trees, which allows them to quickly establish themselves in suitable environments. In the image, the sycamore tree is growing directly in the stream, which is typical for this species. The clear water and surrounding vegetation suggest a healthy riparian ecosystem.
hscore: 82.18%
• <spatial (tree, stream, in)> == <spatial (tree, stream, in)> (100.0%)
• <spatial (tree, stream, out of)> == <spatial (tree, stream, in)> (90.0%)
• <state (tree, growing)> == <state (tree, thriving)> (80.0%)
tscore: 75.32%
• <spatial (tree, stream, in)> == <spatial (tree, stream, in)> (100.0%)
• <whole (tree)> == <whole (tree)> (100.0%)
• <spatial (tree, surrounding vegetation, near)"> == <spatial (trees, water, over)> (80.0%)

Q: What is located in the bottom right corner of the image?
A: A gray rock surface is located in the bottom right corner of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'rock surface' in entity:                                        
  3             surface_relations = scene_graph.get_outgoing_relations(entity)  
  4             if 'bottom right corner of the image' in surface_relations:     
  5                 return                                                      
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  6     return None                                                             
  7                                                                             

gpt-4o: In the bottom right corner of the image, there are some rocks and vegetation along the edge of a shallow stream. The vegetation includes small bushes and grass-like plants.
hscore: 59.37%
• <whole (rock surface)> == <whole (rocks)> (70.0%)
• <spatial (rock surface, image, in bottom right corner)> == <spatial (rocks, stream's edge, at)> (60.0%)
• <color (rock surface, gray)> == <spatial (rocks, stream's edge, at)> (40.0%)
tscore: 83.88%
• <whole (stream)> == <whole (stream)> (100.0%)
• <state (stream, shallow)> == <state (stream, shallow)> (100.0%)
• <whole (rocks)> == <whole (rocks)> (100.0%)
Image
Caption

A cream colored labradoodle standing on a wooden bench painted light brown, a red leash is attached to the dog's collar extending vertically from the bottom of the image. The dog's body is facing the left side of the image, and its head is facing forward toward the bottom of the image. The bench is placed on a pebble surface. To the left of the bench is a gray metal chain link border attached to a dark gray wooden post that extends to the top left corner of the image. Behind the barrier is another wooden object. There is a light brown wooden fence behind the bench with horizontal planks. There is a space between the barrier and the fence. There is a faint shadow being cast coming from the left side of the image, it covers most of the image and cuts off in the bottom right corner. The dog is casting a shadow on the bench going toward the bottom of the image.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is attached to the dog's collar?
A: A red leash is attached to the dog's collar.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "leash" in entity:                                               
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if "dog's collar" in relations_out:                             
  5                 return 'leash'                                              
  6     return None                                                             
  7                                                                             

gpt-4o: Attached to the dog's collar is a red leash.
hscore: 99.09%
• <part (leash)> == <part (leash)> (100.0%)
• <color (leash, red)> == <color (leash, red)> (100.0%)
• <spatial (leash, dog's collar, attached to)> == <spatial (leash, dog's collar, attached to)"> (90.0%)
tscore: 90.56%
• <color (leash, red)> == <color (leash, red)> (100.0%)
• <spatial (leash, dog's collar, attached to)"> == <spatial (leash, dog's collar, attached to)> (90.0%)
• <whole (dog's collar)> == <part (dog's collar)> (80.0%)

Q: Describe the dog.
A: The dog is a cream-colored labradoodle standing on a wooden bench. Its body is facing the left, and its head is facing forward toward the bottom of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'labradoodle' in entity:                                         
  3             return                                                          
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  4     return None                                                             
  5                                                                             

gpt-4o: The dog in the image appears to be a young, fluffy, and curly-coated breed, possibly a Labradoodle or a similar type of doodle mix. It has a light cream or beige-colored coat with a slightly darker nose. The dog is standing on a wooden bench and is on a red leash. The background includes a wooden fence and some gravel. The dog looks healthy and well-groomed.
hscore: 68.76%
• <color (labradoodle, cream> == <color (dog's coat, light cream or beige)> (80.0%)
• <whole (labradoodle)> == <whole (dog)> (80.0%)
• <material (bench, wooden)> == <material (dog's bench, wooden)> (80.0%)
tscore: 78.29%
• <whole (dog)> == <whole (dog)> (100.0%)
• <color (dog's leash, red)> == <color (leash, red)> (90.0%)
• <material (dog's bench, wooden)> == <material (bench, wood)> (80.0%)

Q: What is to the left of the bench?
A: To the left of the bench is a gray metal chain link border attached to a dark gray wooden post.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'chain link border' in entity:                                   
  3             relations_out = scene_graph.get_outgoing_relations(entity)      
  4             if 'bench' in relations_out and 'left of' in                    
    relations_out['bench']['spatial']:                                          
  5                 return 'chain link border'                                  
  6     return None                                                             
  7                                                                             

gpt-4o: To the left of the bench, there is a metal fence and some gravel on the ground. There are also some leaves scattered among the gravel.
hscore: 54.77%
• <spatial (chain link border, bench, to the left of)> == <spatial (metal fence, bench, to the left of)> (70.0%)
• <whole (chain link border)> == <whole (metal fence)> (60.0%)
• <part (chain link border's wooden post)> == <whole (metal fence)> (50.0%)
tscore: 75.2%
• <whole (bench)> == <whole (bench)> (100.0%)
• <whole (metal fence)> == <whole (fence)> (80.0%)
• <spatial (metal fence, bench, to the left of)> == <spatial (fence, bench, behind)> (80.0%)
Image
Caption

Eye-level view during the day of the side of a building that has a brown brick wall with a painting of a young woman, an eagle and a coyote on it. The woman has long flowing hair and she wears a broad-brimmed hat and an elaborate necklace. The head of a coyote is painted to the left of the woman, and above the coyote is painted a brown eagle spreading its wings. Bushes grow from the base of the wall, partly covering the bottom of the painting and spreading along the gravel on the ground. A pipe with stickers on it goes up the wall, just to the right of the painting. NANU is written at the top of the wall. A small portion of a green tree can be seen just to the top left of the building, with sky stretching toward the right.

Scene Graph
    Scene Graph
QA + verification programs
Q: What are the different elements painted on the wall?
A: The elements painted on the wall are a young woman, an eagle, and a coyote.
  0 def execute_command(scene_graph):                                           
  1     items_on_wall = []                                                      
  2     for entity in scene_graph.get_entities():                               
  3         outgoing_rels = scene_graph.get_outgoing_relations(entity)          
  4         if 'painting' in outgoing_rels and 'on' in                          
    outgoing_rels['painting']['spatial']:                                       
  5             items_on_wall.append(entity)                                    
  6     return items_on_wall                                                    
  7                                                                             

gpt-4o: The mural on the wall features several distinct elements: 1. A large bird, likely an eagle or hawk, with detailed feathers and a prominent beak, flying in the upper left section. 2. A white wolf or dog, depicted in profile, occupying the lower left section. 3. A person wearing a wide-brimmed hat and adorned with jewelry, positioned on the right side of the mural. 4. A background with abstract shapes and colors, including shades of blue, purple, and brown. These elements are painted in a stylized, artistic manner, creating a visually striking composition.
hscore: 53.98%
• <part (wall's painted elements)> == <part (mural's background)> (60.0%)
• <whole (young woman)> == <whole (mural)> (50.0%)
• <spatial (young woman, eagle, coyote, wall, on)> == <type (mural's bird, eagle or hawk)> (50.0%)
tscore: 61.64%
• <whole (mural)> == <whole (painting)> (80.0%)
• <state (mural's bird, flying)> == <state (eagle, spread its wings)> (70.0%)
• <color (mural's background, blue)> == <image> (70.0%)

Q: Where is the pipe located in relation to the painting?
A: The pipe is located to the right of the painting.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'pipe' in entity:                                                
  3             outgoing_rels = scene_graph.get_outgoing_relations(entity)      
  4             if 'painting' in outgoing_rels:                                 
  5                 return outgoing_rels['painting']['spatial']                 
  6     return None                                                             
  7                                                                             

gpt-4o: The pipe is located to the right of the painting. It runs vertically along the wall, starting from the ground and extending upwards, passing through the middle of the rightmost section of the mural.
hscore: 100%
• <spatial (pipe, painting, to the right of)> == <spatial (pipe, painting, to the right of)> (100.0%)
tscore: 82.53%
• <whole (pipe)> == <whole (pipe)> (100.0%)
• <spatial (pipe, wall, along)> == <spatial (pipe, wall, up)> (90.0%)
• <spatial (pipe, painting, to the right of)> == <spatial (pipe, painting, right of)> (90.0%)

Q: Can you describe the young woman in the painting?
A: The young woman has long flowing hair, wears a broad-brimmed hat, and an elaborate necklace.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'woman' in entity:                                               
  3             return                                                          
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  4     return None                                                             
  5                                                                             

gpt-4o: The painting depicts a young woman with a serious expression. She is wearing a large, wide-brimmed hat and has long hair. Her attire includes a necklace with multiple strands of beads, which adds a decorative element to her appearance. The background features a stylized eagle in flight and a wolf's head, suggesting themes of nature and wildlife. The colors used are predominantly shades of blue, purple, and brown, creating a vibrant and dynamic composition. The mural is painted on a brick wall, and there is some greenery at the base of the wall.
hscore: 84.65%
• <part (young woman's necklace)> == <part (young woman's necklace)> (100.0%)
• <state (young woman's necklace, elaborate)> == <state (young woman's necklace, multiple strands)> (90.0%)
• <state (young woman's hat, broad> == <state (young woman, wearing a hat)> (80.0%)
tscore: 75.45%
• <whole (painting)> == <whole (painting)> (100.0%)
• <whole (eagle)> == <whole (eagle)> (100.0%)
• <part (young woman's necklace)> == <part (woman's necklace)> (90.0%)

Q: What is growing from the base of the wall?
A: Bushes are growing from the base of the wall, spreading along the gravel on the ground and partly covering the painting.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'bushes' in entity:                                              
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('state', '')                              
  5     return None                                                             
  6                                                                             

gpt-4o: There are green plants growing from the base of the wall. They appear to be some type of vine or ground cover, spreading out along the ground and up the wall.
hscore: 65.82%
• <spatial (bushes, ground, on)> == <spatial (plants, ground, along)> (80.0%)
• <spatial (bushes, wall, from the base of)> == <spatial (plants, base of the wall, from)> (80.0%)
• <spatial (bushes, gravel, along)> == <spatial (plants, ground, along)> (70.0%)
tscore: 76.58%
• <spatial (plants, ground, along)> == <spatial (bushes, ground, spread along)> (80.0%)
• <spatial (plants, base of the wall, from)> == <spatial (bushes, wall, from)> (80.0%)
• <spatial (plants, wall, up)"> == <spatial (bushes, wall, from)> (70.0%)

Q: What is written at the top of the wall?
A: 'NANU' is written at the top of the wall.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'NANU' in entity:                                                
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('text rendering', '')                     
  5     return None                                                             
  6                                                                             

gpt-4o: At the top of the wall, the word "NANI" is written.
hscore: 63.64%
• <text rendering (wall, 'NANU')> == <text rendering (wall, \"NANI\")"> (80.0%)
• <spatial (text, top of the wall, at)> == <text rendering (wall, \"NANI\")"> (30.0%)
tscore: 73.85%
• <text rendering (wall, \"NANI\")"> == <text rendering (NANU, "NANU")> (70.0%)

Q: What type of material is the wall made of?
A: The wall is made of brown brick.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'wall' in entity:                                                
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return attributes.get('material', '')                           
  5     return None                                                             
  6                                                                             

gpt-4o: The wall in the image appears to be made of brick. The horizontal lines and the texture visible in the image are characteristic of brick construction.
hscore: 77.58%
• <material (wall, brick)> == <material (wall, brick)> (100.0%)
• <color (wall, brown)> == <material (wall, brick)> (50.0%)
tscore: 84.97%
• <whole (wall)> == <whole (wall)> (100.0%)
• <material (wall, brick)> == <material (wall, brown brick)> (90.0%)
• <characteristic (wall, texture)"> == <image> (70.0%)

Q: How is the eagle depicted in the painting?
A: The eagle is depicted spreading its wings and is brown.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'eagle' in entity:                                               
  3             return                                                          
    scene_graph.describe(scene_graph.generate_subgraph([entity]))               
  4     return None                                                             
  5                                                                             

gpt-4o: In the painting, the eagle is depicted in flight with its wings spread wide. The eagle's body is primarily brown, with a white head and tail, and it has red feathers on its wings. The bird is positioned in the upper left section of the mural, soaring against a blue and purple background. The eagle appears to be a prominent and dynamic element of the artwork, contributing to the overall theme and composition.
hscore: 94.39%
• <state (eagle, spread wings)> == <state (eagle, wings spread wide)> (90.0%)
• <color (eagle, brown)> == <color (eagle's body, brown)> (90.0%)
tscore: 85.15%
• <whole (eagle)> == <whole (eagle)> (100.0%)
• <whole (painting)> == <whole (painting)> (100.0%)
• <color (eagle's body, brown)> == <color (eagle, brown)> (90.0%)
Image
Caption

An outdoor top-down view of lane two of a rubber track on a bright sunny day. The white spray-painted stencil font number, "2" sits between two vertical white lines. The three section of the number two have a light outline from overspray. The white paint has been worn, as pieces of the black rubber are visible throughout the number and the lines. Small white pieces of rubber are also visible scattered across the track.

Scene Graph
    Scene Graph
QA + verification programs
Q: Can you explain the condition of the paint on the number "2"?
A: The paint on the number "2" is worn, with pieces of black rubber visible through it.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'paint' in entity:                                               
  3             return scene_graph.get_attributes(entity).get('state', "")      
  4     return None                                                             
  5                                                                             

gpt-4o: The paint on the number "2" appears to be in relatively good condition. The white paint is still quite visible and contrasts well against the dark background. However, there are some signs of wear and fading, particularly around the edges of the number. The outline of the number is slightly blurred, indicating that the paint may be starting to wear off in those areas. Overall, the paint is still intact and legible, but it shows some minor signs of aging and use.
hscore: 69.15%
• <state (number 2's paint, worn)> == <state (number '2', paint, intact)> (80.0%)
• <state (number 2, paint, worn)> == <state (number '2', paint, intact)> (80.0%)
• <part (number 2's paint)> == <part (number '2')> (70.0%)
tscore: 72.9%
• <state (number '2', paint, worn off, around edges)> == <state (paint, worn)> (80.0%)
• <color (number '2', paint, white)> == <color (paint, white)> (80.0%)
• <part (number '2')> == <image> (70.0%)
Image
Caption

A large Didone Room number font "7" is seen below, painted on a sports track. The number has a wide, rounded bottom of the stem and a small line on the left edge of the horizontal line, which is also thick and rounded with a small upward tail on the right. The number is stenciled on and shows a white, faded chalk outline around it that has imperfections in small areas. A small gap between the top horizontal line and the stem of the number shows that a stencil was used. The track has a dark gray rubber particle surface. Small white flecks are seen in the dark gray of the rubber. There are two white lines, one on each side of the number. They both tilt inward at the top, showing the angle from which the number is seen. The right line tilts more than the left line. A natural light shines from the bottom edge of the image, showing the top edge just slightly darker.

Scene Graph
    Scene Graph
QA + verification programs
Q: What type of font is used for the number on the sports track?
A: The font is a Didone Room type.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         if 'font' in entity and 'Didone Room' in attributes.get('type', ''):
  4             return attributes.get('type', None)                             
  5     return None                                                             
  6                                                                             
  7                                                                             

gpt-4o: The number on the sports track appears to be in a bold, sans-serif font. This type of font is commonly used for track numbers because it is clear and easy to read from a distance. The specific style of the number "7" in the image has a slightly stylized design, which is often used to give a unique look to the track markings.
hscore: 34.42%
• <text rendering (building, \MOTOR WORKS\)> == <style (sports track's numbers, sans> (30.0%)
• <type (text, Didone Room)> == <type (sports track's numbers, font)> (30.0%)
tscore: 74.58%
• <type (sports track's numbers, font)> == <spatial (number font, sports track, below)> (80.0%)
• <type (sports track, track)> == <image> (70.0%)
• <whole (sports track)> == <image> (70.0%)

Q: Identify the specific shapes and features of the number's horizontal line?
A: The horizontal line is thick, rounded, with a small upward tail and a small line on the left edge.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         if "number's horizontal line" in entity:                            
  4             return attributes.get('shape', None)                            
  5     return None                                                             
  6                                                                             
  7                                                                             

gpt-4o: The number in the image is a stylized "7". The horizontal line at the top of the "7" has a unique shape and features: 1. **Curved Shape**: The horizontal line is not straight; it has a curved, wave-like shape. 2. **Thick and Thin Variations**: The line varies in thickness, being thicker in the middle and tapering off towards the ends. 3. **Extended End**: The left end of the horizontal line extends slightly beyond the vertical stroke of the "7". These features give the number a distinctive and artistic appearance.
hscore: 69.61%
• <shape (line, horizontal)> == <shape (horizontal line, curved)> (80.0%)
• <state (line, edge, left, small line, on)> == <state (horizontal line, extended end)> (70.0%)
• <state (line, tail, upward, small)> == <state (horizontal line, extended end)> (70.0%)
tscore: 79.39%
• <part (number's horizontal line)> == <part (number's horizontal line)> (100.0%)
• <whole (stylized number)> == <whole (number)> (70.0%)
• <spatial (horizontal line, vertical stroke, beyond)"> == <image> (70.0%)

Q: Is there a gap between any parts of the number, and if so, where?
A: Yes, there is a small gap between the top horizontal line and the stem of the number.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         relations_out = scene_graph.get_outgoing_relations(entity)          
  3         if 'gap' in entity:                                                 
  4             between_parts = [target for target in relations_out.keys() if   
    'between' in relations_out[target]['spatial']]                              
  5             return f"There is a small gap between {',                       
    '.join(between_parts)}."                                                    
  6     return None                                                             
  7                                                                             
  8                                                                             

gpt-4o: Yes, there is a gap between the top and bottom parts of the number "7" in the image. The top part, which is horizontal, is separated from the bottom part, which is diagonal, creating a distinct gap between the two sections.
hscore: 43.92%
• <state (text, peeling)> == <state (number 7, gap between top and bottom parts)"> (40.0%)
• <state (text, faded)> == <state (number 7, separated)> (40.0%)
tscore: 71.06%
• <spatial (number 7, top part, horizontal)> == <image> (70.0%)
• <spatial (number 7, bottom part, diagonal)> == <image> (70.0%)
• <state (number 7, separated)> == <state (number, faded chalk outline)> (70.0%)
Image
Caption

A black spaceship hanging from the ceiling, the spaceship appears to be viewed from a worm's eye view. Below the spaceship is a black disk with white lines and circles with dots design, the design has white light seeping through it and a circular base below with a red LED strip surrounding it. The ship also has a grey dome with long and thin poles that poke out in different directions, in the tips of the poles are bright red LED lights. the ship appears to be in a dark room with a white metal beam ceiling.

Scene Graph
    Scene Graph
QA + verification programs
Q: What color is the spaceship that's hanging from the ceiling, and what can you see in the room?
A: The spaceship hanging from the ceiling is black, and it appears to be in a dark room with a white metal beam ceiling.
   0 def execute_command(scene_graph):                                          
   1     spaceship_color = None                                                 
   2     room_condition = None                                                  
   3     scene_description = []                                                 
   4     for entity in scene_graph.get_entities():                              
   5         if 'spaceship' in entity and 'ceiling' in                          
     scene_graph.get_outgoing_relations(entity):                                
   6             spaceship_color =                                              
     scene_graph.get_attributes(entity).get('color', None)                      
   7         if 'room' in entity:                                               
   8             room_condition =                                               
     scene_graph.get_attributes(entity).get('state', None)                      
   9             ceiling_material =                                             
     scene_graph.get_attributes('ceiling').get('material', None)                
  10             ceiling_color =                                                
     scene_graph.get_attributes('ceiling').get('color', None)                   
  11             if ceiling_material or ceiling_color:                          
  12                 scene_description.append(f"a {ceiling_color}               
     {ceiling_material} ceiling")                                               
  13     return f"The spaceship hanging from the ceiling is {spaceship_color},  
     and it appears to be in a {room_condition} room with {'                    
     '.join(scene_description)}." if spaceship_color and room_condition else    
     "Couldn't determine the description."                                      
  14                                                                            

gpt-4o: The spaceship hanging from the ceiling is primarily black with white and blue circular and linear patterns, and it has red lights around its center. In the room, you can see part of a wall with star decorations and a doorway leading to another room. The ceiling has exposed beams and pipes, contributing to an industrial look.
hscore: 80.57%
• <whole (spaceship)> == <whole (spaceship)> (100.0%)
• <whole (room)> == <whole (room)> (100.0%)
• <state (spaceship, hanging from the ceiling)> == <state (spaceship, hanging from ceiling)> (90.0%)
tscore: 77.32%
• <whole (ceiling)> == <whole (ceiling)> (100.0%)
• <whole (spaceship)> == <whole (spaceship)> (100.0%)
• <whole (room)> == <whole (room)> (100.0%)

Q: What kind of beam makes up the ceiling, and what is its color?
A: The ceiling is made up of white metal beams.
  0 def execute_command(scene_graph):                                           
  1     ceiling_material = None                                                 
  2     ceiling_color = None                                                    
  3     for entity in scene_graph.get_entities():                               
  4         if 'ceiling' in entity:                                             
  5             ceiling_material =                                              
    scene_graph.get_attributes(entity).get('material', None)                    
  6             ceiling_color = scene_graph.get_attributes(entity).get('color', 
    None)                                                                       
  7     return f"The ceiling is made up of {ceiling_color} {ceiling_material}   
    beams." if ceiling_material and ceiling_color else "Couldn't describe the   
    ceiling beam."                                                              
  8                                                                             

gpt-4o: The ceiling is made up of metal beams, specifically I-beams, and they are gray in color.
hscore: 89.65%
• <material (ceiling's beams, metal)> == <material (ceiling's metal beams, metal)> (90.0%)
• <part (ceiling's beams)> == <part (ceiling's metal beams)> (90.0%)
• <color (ceiling's beams, white)> == <color (ceiling's metal beams, gray)"> (80.0%)
tscore: 85.87%
• <whole (ceiling)> == <whole (ceiling)> (100.0%)
• <material (ceiling's metal beams, metal)> == <material (ceiling, metal beam)> (90.0%)
• <part (ceiling's metal beams)> == <material (ceiling, metal beam)> (80.0%)
Image
Caption

An outdoor wide angle shot of the shadowed dirt area with patches of grass on the underside of a multi-lane highway with rows of three round cement support columns. Large bushy trees fill a woodsy area in the background, shining under the bright sunlight. A cement round walkway is visible on the left with its curved raised about a foot around the columns on the left side. The middle column on the left side is covered in vines. A clear blue skies visible above the bridge.

Scene Graph
    Scene Graph
QA + verification programs
Q: Where can you find clear blue skies in this scene?
A: The clear blue skies are visible above the bridge.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'skies' in entity:                                               
  3             if 'blue' in scene_graph.get_attributes(entity).get('color',    
    ''):                                                                        
  4                 spatial_relations =                                         
    scene_graph.get_outgoing_relations(entity)                                  
  5                 if 'bridge' in spatial_relations:                           
  6                     return "above the bridge"                               
  7     return 'Location unavailable.'                                          
  8                                                                             

gpt-4o: In this scene, clear blue skies can be seen above the bridge. The sky is visible in the upper part of the image, above the structure of the bridge.
hscore: 94.92%
• <whole (skies)> == <whole (sky)> (90.0%)
• <color (skies, clear blue)> == <color (sky, clear blue)> (90.0%)
• <spatial (skies, bridge, above)> == <spatial (sky, bridge, above)> (90.0%)
tscore: 92.31%
• <whole (bridge)> == <whole (bridge)> (100.0%)
• <color (sky, clear blue)> == <color (skies, blue)> (90.0%)
• <whole (sky)> == <whole (skies)> (90.0%)

Q: How does the walkway interact with the columns?
A: The walkway is curved, round, and raised about a foot around the columns on the left side.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'walkway' in entity:                                             
  3             attributes = scene_graph.get_attributes(entity)                 
  4             if 'curved' in attributes.get('shape', ''):                     
  5                 return 'The walkway is curved, round, and raised about a    
    foot around the columns on the left side.'                                  
  6     return 'Interaction description unavailable.'                           
  7                                                                             

gpt-4o: In the image, the walkway interacts with the columns by curving around them. The columns support the structure above, and the walkway is designed to navigate around these columns, creating a path that weaves through the space beneath the elevated structure. This design allows for continuous pedestrian movement while accommodating the necessary structural supports.
hscore: 80.69%
• <spatial (walkway, columns, around)> == <spatial (walkway, columns, around)> (100.0%)
• <state (walkway, raised)> == <state (walkway, designed for navigation)> (80.0%)
• <state (walkway, about a foot)> == <state (walkway, designed for navigation)> (80.0%)
tscore: 79.35%
• <whole (walkway)> == <whole (walkway)> (100.0%)
• <whole (columns)> == <whole (columns)> (100.0%)
• <state (walkway, continuous movement)> == <image> (70.0%)
Image
Caption

An overhead view of two gray sanderling birds walking on a gray damp sand surface. The bird nearest in view toward the right side of the image is facing the top left corner of the image. The bird walking ahead is facing the top right corner of the image. There are white seashells scattered throughout the entire image, most of the shells are piled up in the bottom right and left side of the image.

Scene Graph
    Scene Graph
QA + verification programs
Q: Can you tell me where the seashells are located in the image?
A: The seashells are scattered throughout the entire image, with most shells piled up in the bottom right and left side of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'seashells' in entity:                                           
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             locations = [place for place, details in relations.items() if   
    'in' in details['spatial']]                                                 
  5             return f'Seashells are in: {", ".join(locations)}'              
  6     return None                                                             
  7                                                                             

gpt-4o: The seashells in the image are scattered across the sandy ground, covering most of the surface. They are particularly concentrated in the lower left and upper left areas of the image. There are also seashells spread out around the two birds in the center of the image.
hscore: 91.84%
• <whole (seashells)> == <whole (seashells)> (100.0%)
• <state (seashells, scattered)> == <state (seashells, scattered)> (100.0%)
• <spatial (seashells, bottom left, piled up)> == <spatial (seashells, lower left, in)> (80.0%)
tscore: 87.61%
• <state (seashells, scattered)> == <state (seashells, scattered)> (100.0%)
• <whole (seashells)> == <whole (seashells)> (100.0%)
• <spatial (seashells, upper left, in)> == <spatial (seashells, bottom right, in)> (90.0%)

Q: Are there more than two birds in the image?
A: No, there are exactly two birds in the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'sanderling birds' in entity:                                    
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return 'No' if attributes.get('count', '0') == '==2' else 'Yes' 
  5     return None                                                             
  6                                                                             

gpt-4o: No, there are only two birds in the image.
hscore: 44.79%
• <count (birds, ==2)> == <count (birds, ==2)> (100.0%)
• <state (building, old)> == <state (pillar, decorative carved)> (60.0%)
• <whole (building)> == <whole (pillar)> (50.0%)
tscore: 68.6%
• <count (birds, ==2)> == <count (sanderling birds, ==2)> (90.0%)
• <spatial (birds, top of the pillar, at)> == <image> (70.0%)
• <spatial (birds, pillar, near)"> == <image> (70.0%)

Q: What is the relative position of the bird walking ahead to the other bird?
A: The bird walking ahead is in front of the other bird.
   0 def execute_command(scene_graph):                                          
   1     bird_positions = []                                                    
   2     for entity in scene_graph.get_entities():                              
   3         if 'bird' in entity:                                               
   4             state = scene_graph.get_attributes(entity).get('state', '')    
   5             if 'walk ahead' in state:                                      
   6                 bird_positions.append('ahead')                             
   7             else:                                                          
   8                 bird_positions.append('behind')                            
   9     return 'ahead' if 'ahead' in bird_positions else 'behind'              
  10                                                                            

gpt-4o: The bird walking ahead is positioned to the upper left relative to the other bird.
hscore: 91.06%
• <whole (bird)> == <whole (birds)> (90.0%)
• <whole (bird)> == <whole (birds)> (90.0%)
• <spatial (bird, bird, in front of)> == <spatial (bird, bird, relative to)> (80.0%)
tscore: 86.32%
• <state (bird, walking ahead)> == <state (bird, walk ahead)> (90.0%)
• <whole (birds)> == <whole (bird)> (90.0%)
• <count (birds, >1)> == <count (sanderling birds, ==2)> (80.0%)

Q: Can you describe the image surface in terms of color and condition?
A: The image surface is gray and damp.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'sand surface' in entity:                                        
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return f"The surface is {attributes.get('color', '')} and       
    {attributes.get('state', '')}."                                             
  5     return None                                                             
  6                                                                             

gpt-4o: The image shows a surface covered with numerous seashells scattered across a sandy or pebbly ground. The seashells vary in color, predominantly featuring shades of white, beige, and light brown. The ground appears to be dark gray or black sand, providing a stark contrast to the lighter-colored shells. The condition of the surface is natural and somewhat uneven, typical of a beach or coastal area. There are also two small birds visible among the shells, blending in with the surroundings.
hscore: 62.14%
• <state (image surface, damp)> == <state (surface, natural)> (60.0%)
• <color (image surface, gray)> == <color (ground, dark gray or black)> (60.0%)
tscore: 82.1%
• <state (seashells, scattered)> == <state (seashells, scattered)> (100.0%)
• <color (seashells, white)> == <color (seashells, white)> (100.0%)
• <count (birds, ==2)> == <count (sanderling birds, ==2)> (90.0%)
Image
Caption

An eye-level view of a group of multi-colored adirondack folding chairs arranged in a circle on top of a very small hill, the chairs are in the bottom right corner of the image. A dark shadow is covering the majority of the bottom of the image over the chairs. The chairs are either colored yellow, green, orange, or red. Beyond the small hill is a parking lot with multiple grass medians. The medians have trees growing in them creating a row of trees extending across the middle of the image. There is a gray sedan parked under one of the trees on the far left side of the image facing the left side of the image. Behind the car is a small gray sign on the ground with the "Google" logo on it. There is another gray sedan parked in the distance on the right side of the image visible behind the chairs facing the right side of the image. The sky taking up the top half of the image is clear blue, there is a small group of clouds that are barely visible directly above the trees on the right side of the image.

Scene Graph
    Scene Graph
QA + verification programs
Q: Which direction is the gray sedan on the right facing?
A: The gray sedan on the right side of the image is facing the right side of the image.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "sedan" in entity:                                               
  3             relations = scene_graph.get_outgoing_relations(entity)          
  4             if "right side of the image" in relations and "on" in           
    relations["right side of the image"].get("spatial", ""):                    
  5                 return "facing the right side of the image"                 
  6     return ""                                                               
  7                                                                             

gpt-4o: The gray sedan on the right side of the image is facing towards the left.
hscore: 99.3%
• <whole (sedan)> == <whole (sedan)> (100.0%)
• <state (sedan, on the right side of the image)> == <state (sedan, on the right side of the image)> (100.0%)
• <color (sedan, gray)> == <color (sedan, gray)> (100.0%)
tscore: 92.37%
• <color (sedan, gray)> == <color (sedan, gray)> (100.0%)
• <whole (sedan)> == <whole (sedan)> (100.0%)
• <state (sedan, facing towards the left)"> == <state (sedan, parked)> (80.0%)

Q: How do the clouds appear in the scene?
A: The clouds appear as a small group that are barely visible directly above the trees on the right.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "clouds" in entity:                                              
  3             attributes = scene_graph.get_attributes(entity)                 
  4             relations = scene_graph.get_outgoing_relations(entity)          
  5             if "barely visible" in attributes.get("state", "") and "above"  
    in relations["trees"].get("spatial", ""):                                   
  6                 return "barely visible directly above the trees on the      
    right"                                                                      
  7     return ""                                                               
  8                                                                             

gpt-4o: In the scene, the sky is mostly clear with a deep blue color. There are a few small, white clouds visible in the distance, primarily towards the right side of the image. The clouds appear sparse and do not cover much of the sky.
hscore: 78.7%
• <count (clouds, small group)> == <count (clouds, few)"> (80.0%)
• <state (clouds, barely visible)> == <state (clouds, white)> (80.0%)
• <spatial (clouds, trees, directly above)> == <spatial (clouds, right side of the image, towards)> (70.0%)
tscore: 79.27%
• <whole (clouds)> == <whole (clouds)> (100.0%)
• <spatial (clouds, right side of the image, towards)> == <spatial (clouds, right side of the image, in)> (90.0%)
• <color (sky, deep blue)> == <color (sky, clear blue)> (90.0%)
Image
Caption

A black Volvo XC90 is parked in a gray brick driveway oriented in a three-quarters position facing the bottom right corner of the image. To the right of the driveway is a bed of rocks with bushes and a tree growing out of it. To the right of the bed of rocks is a wooden fence and a blue house on the far right side of the image. On the other side of the road is a yellow house with a black car parked in the driveway. The tops of multiple trees in the background and the tree to the right of the black Volvo are going across the middle of the image, the limb of another tree is in the foreground in the top right corner of the image. The reflection of the trees and the sky is shining off of the hood of the car. The sky is almost completely covered by a large cumulonimbus cloud. The sun is shining on the top of the cloud in the middle and on the cloud's right side.

Scene Graph
    Scene Graph
QA + verification programs
Q: What is shining off of the car's hood?
A: The reflection of the trees and the sky is shining off the car's hood.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if "car's hood" in scene_graph.get_outgoing_relations(entity):      
  3             return "The reflection of the trees and the sky is shining off  
    the car's hood."                                                            
  4     return None                                                             
  5                                                                             

gpt-4o: The reflection of the sky and clouds is shining off the car's hood. The bright, illuminated clouds in the sky are creating a noticeable reflection on the surface of the car.
hscore: 75.71%
• <spatial (car's hood, reflection)> == <spatial (car's hood, sky, reflection on)"> (90.0%)
• <spatial (reflection, sky, of)> == <spatial (car's hood, sky, reflection on)"> (80.0%)
• <whole (car's hood)> == <part (car's hood)> (80.0%)
tscore: 88.81%
• <whole (car)> == <whole (car)> (100.0%)
• <whole (sky)> == <whole (sky)> (100.0%)
• <part (car's hood)> == <part (car's hood)> (100.0%)

Q: How is the Volvo XC90 oriented in the driveway?
A: The Volvo XC90 is oriented in a three-quarters position facing the bottom right corner.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         if 'Volvo XC90' in entity:                                          
  4             return f"The Volvo XC90 is oriented {attributes.get('state',    
    '')}."                                                                      
  5     return None                                                             
  6                                                                             

gpt-4o: The Volvo XC90 is oriented facing forward in the driveway, with the front of the vehicle pointing towards the camera.
hscore: 87.71%
• <state (Volvo XC90, oriented)> == <state (Volvo XC90, oriented)> (100.0%)
• <direction (Volvo XC90, facing)> == <direction (Volvo XC90, forward)> (90.0%)
• <orientation (Volvo XC90, three> == <state (Volvo XC90, oriented)> (80.0%)
tscore: 85.53%
• <whole (Volvo XC90)> == <whole (Volvo XC90)> (100.0%)
• <state (Volvo XC90, oriented)> == <state (Volvo XC90, oriented in a three> (90.0%)
• <spatial (Volvo XC90, camera, facing)"> == <spatial (Volvo XC90, bottom right corner of the image, facing)> (80.0%)

Q: What's located on the other side of the road?
A: On the other side of the road is a yellow house with a black car parked in the driveway.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'road' in scene_graph.get_outgoing_relations(entity):            
  3             return "On the other side of the road is a yellow house with a  
    black car parked in the driveway."                                          
  4     return None                                                             
  5                                                                             

gpt-4o: On the other side of the road, there is a house with a dark roof and light-colored walls. The house has a driveway with a car parked in it. The area is surrounded by trees and greenery.
hscore: 76.84%
• <whole (house)> == <whole (house)> (100.0%)
• <part (house's driveway)> == <part (house's driveway)> (100.0%)
• <state (car, parked)> == <state (house's car, parked)> (80.0%)
tscore: 73.52%
• <whole (house)> == <whole (house)> (100.0%)
• <whole (trees and greenery)> == <whole (trees)> (80.0%)
• <part (house's driveway)> == <whole (driveway)> (70.0%)

Q: Identify the material used in making the driveway.
A: The driveway is made of gray brick.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'driveway' in entity:                                            
  3             attributes = scene_graph.get_attributes(entity)                 
  4             return f"The driveway is made of {attributes.get('material',    
    '')}."                                                                      
  5     return None                                                             
  6                                                                             

gpt-4o: The driveway in the image appears to be made of interlocking concrete pavers. These pavers are commonly used for driveways due to their durability, aesthetic appeal, and ease of maintenance.
hscore: 56.89%
• <material (driveway, brick)> == <material (driveway, interlocking concrete pavers)> (60.0%)
• <color (driveway, gray)> == <whole (driveway)> (40.0%)
tscore: 55.61%
• <whole (driveway)> == <whole (driveway)> (100.0%)
• <material (driveway, interlocking concrete pavers)> == <material (driveway, brick)> (60.0%)
• <state (interlocking concrete pavers, commonly used for driveways)> == <material (driveway, brick)> (40.0%)

Q: What's the relationship between the tree and the rocks?
A: The tree is growing on the bed of rocks.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         if 'tree' in entity:                                                
  3             outgoing_relations = scene_graph.get_outgoing_relations(entity) 
  4             if 'rocks' in outgoing_relations:                               
  5                 return "The tree is growing on the bed of rocks."           
  6     return None                                                             
  7                                                                             

gpt-4o: In the image, the tree and the rocks are part of the landscaping around the driveway. The tree is situated to the right of the driveway, providing shade and aesthetic appeal. The rocks are used as a ground cover on either side of the driveway, likely for decorative purposes and to reduce soil erosion. The combination of the tree and rocks creates a natural and visually pleasing environment around the driveway and the house.
hscore: 85.98%
• <spatial (tree, bed of rocks, on)> == <spatial (tree, rocks, together)> (90.0%)
• <whole (bed of rocks)> == <whole (rocks)> (80.0%)
tscore: 76.57%
• <whole (tree)> == <whole (tree)> (100.0%)
• <whole (rocks)> == <whole (rocks)> (100.0%)
• <spatial (tree, rocks, together)> == <spatial (tree, rocks, on)> (90.0%)

Q: Describe the sky in the image.
A: The sky is almost completely covered by a large cumulonimbus cloud with the sun shining on its top and right side.
  0 def execute_command(scene_graph):                                           
  1     for entity in scene_graph.get_entities():                               
  2         attributes = scene_graph.get_attributes(entity)                     
  3         if 'sky' in entity:                                                 
  4             return f"The sky is {attributes.get('state', '')} by a large    
    cumulonimbus cloud."                                                        
  5     return None                                                             
  6                                                                             

gpt-4o: The sky in the image features a mix of blue and gray tones, with large, fluffy clouds. The clouds are illuminated by sunlight, giving them a bright, golden appearance at the top, which contrasts with the darker, shadowed areas below. The overall effect is a dynamic and visually striking sky, suggesting either early morning or late afternoon lighting.
hscore: 65.16%
• <spatial (sun, cumulonimbus cloud, shining on)> == <spatial (clouds, sunlight, at the top)> (80.0%)
• <spatial (cumulonimbus cloud, sky, covers)> == <spatial (clouds, sunlight, at the top)> (70.0%)
• <whole (sun)> == <whole (sky)> (70.0%)
tscore: 75.78%
• <whole (sky)> == <whole (sky)> (100.0%)
• <spatial (clouds, sunlight, at the top)> == <spatial (sun, cloud, on)> (80.0%)
• <spatial (clouds, shadowed areas, below)> == <spatial (cloud, sky, cover)> (80.0%)