404
+ +Page not found
+ + +diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..1388408e --- /dev/null +++ b/404.html @@ -0,0 +1,296 @@ + + +
+ + + + +Page not found
+ + +This project has adopted the Microsoft Open Source Code of Conduct.
+Resources:
+This project welcomes contributions and suggestions. Most contributions require you to +agree to a Contributor License Agreement (CLA) declaring that you have the right to, +and actually do, grant us the rights to use your contribution. For details, visit +https://cla.microsoft.com.
+When you submit a pull request, a CLA-bot will automatically determine whether you need +to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the +instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
+Note
+You should sunmit your pull request to the pre-release
branch, not the main
branch.
This project has adopted the Microsoft Open Source Code of Conduct. +For more information see the Code of Conduct FAQ +or contact opencode@microsoft.com with any additional questions or comments.
+ +By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:
+The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.
+It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.
+By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.
+Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.
+You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.
+The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.
+Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.
+If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.
+Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes.
+By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.
+ +Copyright (c) Microsoft Corporation.
+Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE.
+ +This project uses GitHub Issues to track bugs and feature requests. Please search the existing +issues before filing new issues to avoid duplicates. For new issues, file your bug or +feature request as a new Issue.
+You may use GitHub Issues to raise questions, bug reports, and feature requests.
+For help and questions about using this project, please please contact ufo-agent@microsoft.com.
+Support for this PROJECT or PRODUCT is limited to the resources listed above.
+ +The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.
+To activate the icon control filtering, you need to add ICON
to the CONTROL_FILTER
list in the config_dev.yaml
file. Below is the detailed icon control filter configuration in the config_dev.yaml
file:
CONTROL_FILTER
: A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON
to the list.CONTROL_FILTER_TOP_K_ICON
: The number of controls to keep after filtering.CONTROL_FILTER_MODEL_ICON_NAME
: The control filter model name for icon similarity. By default, it is set to "clip-ViT-B-32".
+ Bases: BasicControlFilter
A class that represents a icon model for control filtering.
+ + + + + + + + + +control_filter(control_dicts, cropped_icons_dict, plans, top_k)
+
+Filters control items based on their scores and returns the top-k items.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 |
|
control_filter_score(control_icon, plans)
+
+Calculates the score of a control icon based on its similarity to the given keywords.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 |
|
There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task.
+Execept for configuring the control types for selection on CONTROL_LIST
in config_dev.yaml
, UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods:
Filtering Method | +Description | +
---|---|
Text |
+Filter the controls based on the control text. | +
Semantic |
+Filter the controls based on the semantic similarity. | +
Icon |
+Filter the controls based on the control icon image. | +
You can activate the control filtering by setting the CONTROL_FILTER
in the config_dev.yaml
file. The CONTROL_FILTER
is a list of filtering methods that you want to apply to the controls, which can be TEXT
, SEMANTIC
, or ICON
.
You can configure multiple filtering methods in the CONTROL_FILTER
list.
The implementation of the control filtering is base on the BasicControlFilter
class located in the ufo/automator/ui_control/control_filter.py
file. Concrete filtering class inherit from the BasicControlFilter
class and implement the control_filter
method to filter the controls based on the specific filtering method.
BasicControlFilter represents a model for filtering control items.
+ + + + + + + + + +__new__(model_path)
+
+Creates a new instance of BasicControlFilter.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 |
|
control_filter(control_dicts, plans, **kwargs)
+
+
+ abstractmethod
+
+
+Calculates the cosine similarity between the embeddings of the given keywords and the control item.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
104 +105 +106 +107 +108 +109 +110 +111 +112 |
|
cos_sim(embedding1, embedding2)
+
+
+ staticmethod
+
+
+Computes the cosine similarity between two embeddings.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 |
|
get_embedding(content)
+
+Encodes the given object into an embedding.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
95 + 96 + 97 + 98 + 99 +100 +101 +102 |
|
load_model(model_path)
+
+
+ staticmethod
+
+
+Loads the model from the given model path.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
84 +85 +86 +87 +88 +89 +90 +91 +92 +93 |
|
plans_to_keywords(plans)
+
+
+ staticmethod
+
+
+Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 |
|
remove_stopwords(keywords)
+
+
+ staticmethod
+
+
+Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords').
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 |
|
The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings.
+To activate the semantic control filtering, you need to add SEMANTIC
to the CONTROL_FILTER
list in the config_dev.yaml
file. Below is the detailed sematic control filter configuration in the config_dev.yaml
file:
CONTROL_FILTER
: A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC
to the list.CONTROL_FILTER_TOP_K_SEMANTIC
: The number of controls to keep after filtering.CONTROL_FILTER_MODEL_SEMANTIC_NAME
: The control filter model name for semantic similarity. By default, it is set to "all-MiniLM-L6-v2".
+ Bases: BasicControlFilter
A class that represents a semantic model for control filtering.
+ + + + + + + + + +control_filter(control_dicts, plans, top_k)
+
+Filters control items based on their similarity to a set of keywords.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 |
|
control_filter_score(control_text, plans)
+
+Calculates the score for a control item based on the similarity between its text and a set of keywords.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 |
|
The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.
+To activate the text control filtering, you need to add TEXT
to the CONTROL_FILTER
list in the config_dev.yaml
file. Below is the detailed text control filter configuration in the config_dev.yaml
file:
CONTROL_FILTER
: A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT
to the list.CONTROL_FILTER_TOP_K_PLAN
: The number of agent's plan keywords or phrases to use for filtering the controls.A class that provides methods for filtering control items based on plans.
+ + + + + + + + + +control_filter(control_dicts, plans)
+
+
+ staticmethod
+
+
+Filters control items based on keywords.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/control_filter.py
171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 |
|
Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.
+Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.
+We currently implement the customization feature in the HostAgent
class. When the HostAgent
needs additional information, it will transit to the PENDING
state and ask the user for the information. The user will provide the information, and the HostAgent
will save it in the local memory base for future reference. The saved information is stored in the blackboard
and can be accessed by all agents in the session.
Note
+The customization memory base is only saved in a local file. These information will not upload to the cloud or any other storage to protect the user's privacy.
+You can configure the customization feature by setting the following field in the config_dev.yaml
file.
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
USE_CUSTOMIZATION |
+Whether to enable the customization. | +Boolean | +True | +
QA_PAIR_FILE |
+The path for the historical QA pairs. | +String | +"customization/historical_qa.txt" | +
QA_PAIR_NUM |
+The number of QA pairs for the customization. | +Integer | +20 | +
The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent
that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.
Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
+Field | +Description | +Type | +
---|---|---|
task | +The task description. | +String | +
steps | +The list of steps for the agent to follow. | +List of Strings | +
object | +The application or file to interact with. | +String | +
Below is an example of a plan file:
+{
+ "task": "Type in a text of 'Test For Fun' with heading 1 level",
+ "steps":
+ [
+ "1.type in 'Test For Fun'",
+ "2.Select the 'Test For Fun' text",
+ "3.Click 'Home' tab to show the 'Styles' ribbon tab",
+ "4.Click 'Styles' ribbon tab to show the style 'Heading 1'",
+ "5.Click 'Heading 1' style to apply the style to the selected text"
+ ],
+ "object": "draft.docx"
+}
+
+Note
+The object
field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode.
To start the Follower mode, run the following command:
+# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_file}
+
+Tip
+Replace {task_name}
with the name of the task and {plan_file}
with the path to the plan file.
You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command:
+# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_folder}
+
+UFO will automatically detect the plan files in the folder and run them one by one.
+Tip
+Replace {task_name}
with the name of the task and {plan_folder}
with the path to the folder containing plan files.
You may want to evaluate the task
is completed successfully or not by following the plan. UFO will call the EvaluationAgent
to evaluate the task if EVA_SESSION
is set to True
in the config_dev.yaml
file.
You can check the evaluation log in the logs/{task_name}/evaluation.log
file.
The follower mode employs a PlanReader
to parse the plan file and create a FollowerSession
to follow the plan.
The PlanReader
is located in the ufo/module/sessions/plan_reader.py
file.
The reader for a plan file.
+ +Initialize a plan reader.
+ + +Parameters: | +
+
|
+
---|
module/sessions/plan_reader.py
17 +18 +19 +20 +21 +22 +23 +24 +25 |
|
get_host_agent_request()
+
+Get the request for the host agent.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 |
|
get_initial_request()
+
+Get the initial request in the plan.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 |
|
get_operation_object()
+
+Get the operation object in the step.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
43 +44 +45 +46 +47 +48 +49 |
|
get_steps()
+
+Get the steps in the plan.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
35 +36 +37 +38 +39 +40 +41 |
|
get_task()
+
+Get the task name.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
27 +28 +29 +30 +31 +32 +33 |
|
next_step()
+
+Get the next step in the plan.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 |
|
task_finished()
+
+Check if the task is finished.
+ + +Returns: | +
+
|
+
---|
module/sessions/plan_reader.py
91 +92 +93 +94 +95 +96 +97 |
|
The FollowerSession
is also located in the ufo/module/sessions/session.py
file.
+ Bases: BaseSession
A session for following a list of plan for action taken. +This session is used for the follower agent, which accepts a plan file to follow using the PlanReader.
+ +Initialize a session.
+ + +Parameters: | +
+
|
+
---|
module/sessions/session.py
197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 |
|
create_new_round()
+
+Create a new round.
+ +module/sessions/session.py
220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 |
|
next_request()
+
+Get the request for the new round.
+ +module/sessions/session.py
257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 |
|
request_to_evaluate()
+
+Check if the session should be evaluated.
+ + +Returns: | +
+
|
+
---|
module/sessions/session.py
273 +274 +275 +276 +277 +278 +279 |
|
When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.
+ExperienceSummarizer
EXPERIENCE_SAVED_PATH
as specified in the config_dev.yaml
filegraph TD;
+ A[Complete Session] --> B[Ask User to Save Experience]
+ B --> C[User Chooses to Save]
+ C --> D[Summarize with ExperienceSummarizer]
+ D --> E[Save in EXPERIENCE_SAVED_PATH]
+ F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience]
+ G --> H[Generate Plan]
+
+Configure the following parameters to allow UFO to use the RAG from its self-experience:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_EXPERIENCE |
+Whether to use the RAG from its self-experience | +Boolean | +False | +
RAG_EXPERIENCE_RETRIEVED_TOPK |
+The topk for the offline retrieved documents | +Integer | +5 | +
The ExperienceSummarizer
class is located in the ufo/experience/experience_summarizer.py
file. The ExperienceSummarizer
class provides the following methods to summarize the experience:
The ExperienceSummarizer class is the summarizer for the experience learning.
+ +Initialize the ApplicationAgentPrompter.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 |
|
build_prompt(log_partition)
+
+Build the prompt.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 |
|
create_or_update_vector_db(summaries, db_path)
+
+
+ staticmethod
+
+
+Create or update the vector database.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 |
|
create_or_update_yaml(summaries, yaml_path)
+
+
+ staticmethod
+
+
+Create or update the YAML file.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 |
|
get_summary(prompt_message)
+
+Get the summary.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 +93 +94 +95 +96 +97 |
|
get_summary_list(logs)
+
+Get the summary list.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 |
|
read_logs(log_path)
+
+
+ staticmethod
+
+
+Read the log.
+ + +Parameters: | +
+
|
+
---|
experience/summarizer.py
117 +118 +119 +120 +121 +122 +123 +124 +125 |
|
The ExperienceRetriever
class is located in the ufo/rag/retriever.py
file. The ExperienceRetriever
class provides the following methods to retrieve the experience:
+ Bases: Retriever
Class to create experience retrievers.
+ +Create a new ExperienceRetriever.
+ + +Parameters: | +
+
|
+
---|
rag/retriever.py
131 +132 +133 +134 +135 +136 |
|
get_indexer(db_path)
+
+Create an experience indexer.
+ + +Parameters: | +
+
|
+
---|
rag/retriever.py
138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 |
|
UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent
's knowledge.
Upon receiving a request, the AppAgent
constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent
then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.
To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key.
+Configure the following parameters to allow UFO to use online Bing search for the decision-making process:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_ONLINE_SEARCH |
+Whether to use the Bing search | +Boolean | +False | +
BING_API_KEY |
+The Bing search API key | +String | +"" | +
RAG_ONLINE_SEARCH_TOPK |
+The topk for the online search | +Integer | +5 | +
RAG_ONLINE_RETRIEVED_TOPK |
+The topk for the online retrieved searched results | +Integer | +1 | +
+ Bases: Retriever
Class to create online retrievers.
+ +Create a new OfflineDocRetriever. +:query: The query to create an indexer for. +:top_k: The number of documents to retrieve.
+ + + + + + +rag/retriever.py
162 +163 +164 +165 +166 +167 +168 +169 |
|
get_indexer(top_k)
+
+Create an online search indexer.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
rag/retriever.py
171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 |
|
Here is the polished document for your Python code project:
+For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance.
+UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer
class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH
as specified in the config_dev.yaml
file. When the AppAgent encounters a similar task, the DemonstrationRetriever
class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration.
Info
+You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document.
+You can find a demo video of learning from user demonstrations:
+ + +Please follow the steps in the User Demonstration Provision document to provide user demonstrations.
+Configure the following parameters to allow UFO to use RAG from user demonstrations:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_DEMONSTRATION |
+Whether to use RAG from user demonstrations | +Boolean | +False | +
RAG_DEMONSTRATION_RETRIEVED_TOPK |
+The top K documents to retrieve offline | +Integer | +5 | +
RAG_DEMONSTRATION_COMPLETION_N |
+The number of completion choices for the demonstration result | +Integer | +3 | +
The DemonstrationSummarizer
class is located in the record_processor/summarizer/summarizer.py
file. The DemonstrationSummarizer
class provides methods to summarize the demonstration:
The DemonstrationSummarizer class is the summarizer for the demonstration learning. +It summarizes the demonstration record to a list of summaries, +and save the summaries to the YAML file and the vector database. +A sample of the summary is as follows: +{ + "example": { + "Observation": "Word.exe is opened.", + "Thought": "The user is trying to create a new file.", + "ControlLabel": "1", + "ControlText": "Sample Control Text", + "Function": "CreateFile", + "Args": "filename='new_file.txt'", + "Status": "Success", + "Plan": "Create a new file named 'new_file.txt'.", + "Comment": "The user successfully created a new file." + }, + "Tips": "You can use the 'CreateFile' function to create a new file." +}
+ +Initialize the DemonstrationSummarizer.
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 |
|
__build_prompt(demo_record)
+
+Build the prompt by the user demonstration record.
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 |
|
__parse_response(response_string)
+
+Parse the response string to a dict of summary.
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 |
|
create_or_update_vector_db(summaries, db_path)
+
+
+ staticmethod
+
+
+Create or update the vector database.
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 |
|
create_or_update_yaml(summaries, yaml_path)
+
+
+ staticmethod
+
+
+Create or update the YAML file.
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 |
|
get_summary_list(record)
+
+Get the summary list for a record
+ + +Parameters: | +
+
|
+
---|
summarizer/summarizer.py
60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 |
|
The DemonstrationRetriever
class is located in the rag/retriever.py
file. The DemonstrationRetriever
class provides methods to retrieve the demonstration:
+ Bases: Retriever
Class to create demonstration retrievers.
+ +Create a new DemonstrationRetriever. +:db_path: The path to the database.
+ + + + + + +rag/retriever.py
198 +199 +200 +201 +202 +203 |
|
get_indexer(db_path)
+
+Create a demonstration indexer. +:db_path: The path to the database.
+ +rag/retriever.py
205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 |
|
User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section.
+The help documents are provided in a format of task-solution pairs. Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions.
+Note
+Since the retrieved help documents may not be relevant to the request, the AppAgent
will only take them as references to generate the plan.
Follow the steps below to activate the learning from help documents:
+Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent.
+Configure the following parameters in the config.yaml
file to activate the learning from help documents:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_OFFLINE_DOCS |
+Whether to use the offline RAG | +Boolean | +False | +
RAG_OFFLINE_DOCS_RETRIEVED_TOPK |
+The topk for the offline retrieved documents | +Integer | +1 | +
+ Bases: Retriever
Class to create offline retrievers.
+ +Create a new OfflineDocRetriever. +:appname: The name of the application.
+ + + + + + +rag/retriever.py
78 +79 +80 +81 +82 +83 +84 +85 |
|
get_indexer(path)
+
+Load the retriever.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
rag/retriever.py
99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 |
|
get_offline_indexer_path()
+
+Get the path to the offline indexer.
+ + +Returns: | +
+
|
+
---|
rag/retriever.py
87 +88 +89 +90 +91 +92 +93 +94 +95 +96 +97 |
|
UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application.
+We currently support the following reinforcement methods:
+Reinforcement Method | +Description | +
---|---|
Learning from Help Documents | +Reinforce the AppAgent by retrieving knowledge from help documents. | +
Learning from Bing Search | +Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. | +
Learning from Self-Experience | +Reinforce the AppAgent by learning from its own successful experiences. | +
Learning from User Demonstrations | +Reinforce the AppAgent by learning from action trajectories demonstrated by users. | +
UFO provides the knowledge to the AppAgent through a context_provision
method defined in the AppAgent
class:
def context_provision(self, request: str = "") -> None:
+ """
+ Provision the context for the app agent.
+ :param request: The Bing search query.
+ """
+
+ # Load the offline document indexer for the app agent if available.
+ if configs["RAG_OFFLINE_DOCS"]:
+ utils.print_with_color(
+ "Loading offline help document indexer for {app}...".format(
+ app=self._process_name
+ ),
+ "magenta",
+ )
+ self.build_offline_docs_retriever()
+
+ # Load the online search indexer for the app agent if available.
+
+ if configs["RAG_ONLINE_SEARCH"] and request:
+ utils.print_with_color("Creating a Bing search indexer...", "magenta")
+ self.build_online_search_retriever(
+ request, configs["RAG_ONLINE_SEARCH_TOPK"]
+ )
+
+ # Load the experience indexer for the app agent if available.
+ if configs["RAG_EXPERIENCE"]:
+ utils.print_with_color("Creating an experience indexer...", "magenta")
+ experience_path = configs["EXPERIENCE_SAVED_PATH"]
+ db_path = os.path.join(experience_path, "experience_db")
+ self.build_experience_retriever(db_path)
+
+ # Load the demonstration indexer for the app agent if available.
+ if configs["RAG_DEMONSTRATION"]:
+ utils.print_with_color("Creating an demonstration indexer...", "magenta")
+ demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
+ db_path = os.path.join(demonstration_path, "demonstration_db")
+ self.build_human_demonstration_retriever(db_path)
+
+The context_provision
method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml
file.
UFO employs the Retriever
class located in the ufo/rag/retriever.py
file to retrieve knowledge from various sources. The Retriever
class provides the following methods to retrieve knowledge:
+ Bases: ABC
Class to retrieve documents.
+ +Create a new Retriever.
+ + + + + + +rag/retriever.py
42 +43 +44 +45 +46 +47 +48 +49 |
|
get_indexer()
+
+
+ abstractmethod
+
+
+Get the indexer.
+ + +Returns: | +
+
|
+
---|
rag/retriever.py
51 +52 +53 +54 +55 +56 +57 |
|
retrieve(query, top_k, filter=None)
+
+Retrieve the document from the given query. +:filter: The filter to apply to the retrieved documents.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
rag/retriever.py
59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 |
|
An AppAgent
is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent
is created by the HostAgent
to fulfill a sub-task within a Round
. The AppAgent
is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent
has the following features:
AppAgent
recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request.AppAgent
is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application "expert".AppAgent
is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and "Copilot".Tip
+You can find the how to enhance the AppAgent
with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation.
We show the framework of the AppAgent
in the following diagram:
To interact with the application, the AppAgent
receives the following inputs:
Input | +Description | +Type | +
---|---|---|
User Request | +The user's request in natural language. | +String | +
Sub-Task | +The sub-task description to be executed by the AppAgent , assigned by the HostAgent . |
+String | +
Current Application | +The name of the application to be interacted with. | +String | +
Control Information | +Index, name and control type of available controls in the application. | +List of Dictionaries | +
Application Screenshots | +Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). | +List of Strings | +
Previous Sub-Tasks | +The previous sub-tasks and their completion status. | +List of Strings | +
Previous Plan | +The previous plan for the following steps. | +List of Strings | +
HostAgent Message | +The message from the HostAgent for the completion of the sub-task. |
+String | +
Retrived Information | +The retrieved information from external knowledge bases or demonstration libraries. | +String | +
Blackboard | +The shared memory space for storing and sharing information among the agents. | +Dictionary | +
Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm.
+By processing these inputs, the AppAgent
determines the necessary actions to fulfill the user's request within the application.
Tip
+Whether to concatenate the clean screenshot and annotated screenshot can be configured in the CONCAT_SCREENSHOT
field in the config_dev.yaml
file.
Tip
+Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the INCLUDE_LAST_SCREENSHOT
field in the config_dev.yaml
file.
With the inputs provided, the AppAgent
generates the following outputs:
Output | +Description | +Type | +
---|---|---|
Observation | +The observation of the current application screenshots. | +String | +
Thought | +The logical reasoning process of the AppAgent . |
+String | +
ControlLabel | +The index of the selected control to interact with. | +String | +
ControlText | +The name of the selected control to interact with. | +String | +
Function | +The function to be executed on the selected control. | +String | +
Args | +The arguments required for the function execution. | +List of Strings | +
Status | +The status of the agent, mapped to the AgentState . |
+String | +
Plan | +The plan for the following steps after the current action. | +List of Strings | +
Comment | +Additional comments or information provided to the user. | +String | +
SaveScreenshot | +The flag to save the screenshot of the application to the blackboard for future reference. |
+Boolean | +
Below is an example of the AppAgent
output:
{
+ "Observation": "Application screenshot",
+ "Thought": "Logical reasoning process",
+ "ControlLabel": "Control index",
+ "ControlText": "Control name",
+ "Function": "Function name",
+ "Args": ["arg1", "arg2"],
+ "Status": "AgentState",
+ "Plan": ["Step 1", "Step 2"],
+ "Comment": "Additional comments",
+ "SaveScreenshot": true
+}
+
+Info
+The AppAgent
output is formatted as a JSON object by LLMs and can be parsed by the json.loads
method in Python.
The AppAgent
state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py
module. The states include:
State | +Description | +
---|---|
CONTINUE |
+The AppAgent continues executing the current action. |
+
FINISH |
+The AppAgent has completed the current sub-task. |
+
ERROR |
+The AppAgent encountered an error during execution. |
+
FAIL |
+The AppAgent believes the current sub-task is unachievable. |
+
CONFIRM |
+The AppAgent is confirming the user's input or action. |
+
SCREENSHOT |
+The AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot. |
+
The state machine diagram for the AppAgent
is shown below:
The AppAgent
progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the HostAgent
.
The AppAgent
is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent
leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.
User can provide help documents to the AppAgent
to enhance its comprehension of the application and improve its performance in the config.yaml
file.
Tip
+Please find details configuration in the documentation.
+Tip
+You may also refer to the here for how to provide help documents to the AppAgent
.
In the AppAgent
, it calls the build_offline_docs_retriever
to build a help document retriever, and uses the retrived_documents_prompt_helper
to contruct the prompt for the AppAgent
.
Since help documents may not cover all the information or the information may be outdated, the AppAgent
can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml
file.
Tip
+Please find details configuration in the documentation.
+Tip
+You may also refer to the here for the implementation of Bing search in the AppAgent
.
In the AppAgent
, it calls the build_online_search_retriever
to build a Bing search retriever, and uses the retrived_documents_prompt_helper
to contruct the prompt for the AppAgent
.
You may save successful action trajectories in the AppAgent
to learn from self-demonstrations and improve its performance. After the completion of a session
, the AppAgent
will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml
file.
Tip
+You can find details of the configuration in the documentation.
+Tip
+You may also refer to the here for the implementation of self-demonstrations in the AppAgent
.
In the AppAgent
, it calls the build_experience_retriever
to build a self-demonstration retriever, and uses the rag_experience_retrieve
to retrieve the demonstration for the AppAgent
.
In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent
to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent
will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml
file.
Tip
+You can find details of the configuration in the documentation.
+Tip
+You may also refer to the here for the implementation of human demonstrations in the AppAgent
.
In the AppAgent
, it calls the build_human_demonstration_retriever
to build a human demonstration retriever, and uses the rag_experience_retrieve
to retrieve the demonstration for the AppAgent
.
The AppAgent
is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface
method. The skills include:
Skill | +Description | +
---|---|
UI Automation | +Mimicking user interactions with the application UI controls using the UI Automation and Win32 API. |
+
Native API | +Accessing the application's native API to execute specific functions and actions. | +
In-App Agent | +Leveraging the in-app agent to interact with the application's internal functions and features. | +
By utilizing these skills, the AppAgent
can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator
module.
+ Bases: BasicAgent
The AppAgent class that manages the interaction with the application.
+ +Initialize the AppAgent. +:name: The name of the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/app_agent.py
28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 |
|
status_manager: AppAgentStatus
+
+
+ property
+
+
+Get the status manager.
+build_experience_retriever(db_path)
+
+Build the experience retriever.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
346 +347 +348 +349 +350 +351 +352 +353 +354 |
|
build_human_demonstration_retriever(db_path)
+
+Build the human demonstration retriever.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
356 +357 +358 +359 +360 +361 +362 +363 +364 |
|
build_offline_docs_retriever()
+
+Build the offline docs retriever.
+ +agents/agent/app_agent.py
328 +329 +330 +331 +332 +333 +334 |
|
build_online_search_retriever(request, top_k)
+
+Build the online search retriever.
+ + +Parameters: | +
+
|
+
---|
agents/agent/app_agent.py
336 +337 +338 +339 +340 +341 +342 +343 +344 |
|
context_provision(request='')
+
+Provision the context for the app agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/app_agent.py
366 +367 +368 +369 +370 +371 +372 +373 +374 +375 +376 +377 +378 +379 +380 +381 +382 +383 +384 +385 +386 +387 +388 +389 +390 +391 +392 +393 +394 +395 +396 +397 +398 +399 +400 +401 +402 |
|
create_puppeteer_interface()
+
+Create the Puppeteer interface to automate the app.
+ + +Returns: | +
+
|
+
---|
agents/agent/app_agent.py
299 +300 +301 +302 +303 +304 |
|
external_knowledge_prompt_helper(request, offline_top_k, online_top_k)
+
+Retrieve the external knowledge and construct the prompt.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 |
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)
+
+Get the prompt for the agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 +85 |
|
message_constructor(dynamic_examples, dynamic_tips, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, host_message, include_last_screenshot)
+
+Construct the prompt message for the AppAgent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 |
|
print_response(response_dict)
+
+Print the response.
+ + +Parameters: | +
+
|
+
---|
agents/agent/app_agent.py
145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 |
|
process(context)
+
+Process the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/app_agent.py
290 +291 +292 +293 +294 +295 +296 +297 |
|
process_comfirmation()
+
+Process the user confirmation.
+ + +Returns: | +
+
|
+
---|
agents/agent/app_agent.py
306 +307 +308 +309 +310 +311 +312 +313 +314 +315 +316 +317 +318 +319 |
|
rag_demonstration_retrieve(request, demonstration_top_k)
+
+Retrieving demonstration examples for the user request.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 +288 |
|
rag_experience_retrieve(request, experience_top_k)
+
+Retrieving experience examples for the user request.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/app_agent.py
243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 |
|
The Blackboard
is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard
is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard
is implemented as a class in the ufo/agents/memory/blackboard.py
file.
The Blackboard
consists of the following data components:
Component | +Description | +
---|---|
questions |
+A list of questions that UFO asks the user, along with their corresponding answers. | +
requests |
+A list of historical user requests received in previous Round . |
+
trajectories |
+A list of step-wise trajectories that record the agent's actions and decisions at each step. | +
screenshots |
+A list of screenshots taken by the agent when it believes the current state is important for future reference. | +
Tip
+The keys stored in the trajectories
are configured as HISTORY_KEYS
in the config_dev.yaml
file. You can customize the keys based on your requirements and the agent's logic.
Tip
+Whether to save the screenshots is determined by the AppAgent
. You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY
flag in the config_dev.yaml
file.
Data in the Blackboard
is based on the MemoryItem
class. It has a method blackboard_to_prompt
that converts the information stored in the Blackboard
to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt
method is defined as follows:
def blackboard_to_prompt(self) -> List[str]:
+ """
+ Convert the blackboard to a prompt.
+ :return: The prompt.
+ """
+ prefix = [
+ {
+ "type": "text",
+ "text": "[Blackboard:]",
+ }
+ ]
+
+ blackboard_prompt = (
+ prefix
+ + self.texts_to_prompt(self.questions, "[Questions & Answers:]")
+ + self.texts_to_prompt(self.requests, "[Request History:]")
+ + self.texts_to_prompt(self.trajectories, "[Step Trajectories Completed Previously:]")
+ + self.screenshots_to_prompt()
+ )
+
+ return blackboard_prompt
+
+Class for the blackboard, which stores the data and images which are visible to all the agents.
+ +Initialize the blackboard.
+ + + + + + +agents/memory/blackboard.py
41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 |
|
questions: Memory
+
+
+ property
+
+
+Get the data from the blackboard.
+ + +Returns: | +
+
|
+
---|
requests: Memory
+
+
+ property
+
+
+Get the data from the blackboard.
+ + +Returns: | +
+
|
+
---|
screenshots: Memory
+
+
+ property
+
+
+Get the images from the blackboard.
+ + +Returns: | +
+
|
+
---|
trajectories: Memory
+
+
+ property
+
+
+Get the data from the blackboard.
+ + +Returns: | +
+
|
+
---|
add_data(data, memory)
+
+Add the data to the a memory in the blackboard.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 |
|
add_image(screenshot_path='', metadata=None)
+
+Add the image to the blackboard.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 |
|
add_questions(questions)
+
+Add the data to the blackboard.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
107 +108 +109 +110 +111 +112 +113 |
|
add_requests(requests)
+
+Add the data to the blackboard.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
115 +116 +117 +118 +119 +120 +121 |
|
add_trajectories(trajectories)
+
+Add the data to the blackboard.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
123 +124 +125 +126 +127 +128 +129 |
|
blackboard_to_prompt()
+
+Convert the blackboard to a prompt.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 |
|
clear()
+
+Clear the blackboard.
+ +agents/memory/blackboard.py
277 +278 +279 +280 +281 +282 +283 +284 |
|
is_empty()
+
+Check if the blackboard is empty.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 |
|
load_questions(file_path, last_k=-1)
+
+Load the data from a file.
+ + +Parameters: | +
+
|
+
---|
agents/memory/blackboard.py
192 +193 +194 +195 +196 +197 +198 +199 +200 |
|
questions_to_json()
+
+Convert the data to a dictionary.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
164 +165 +166 +167 +168 +169 |
|
read_json_file(file_path, last_k=-1)
+
+
+ staticmethod
+
+
+Read the json file.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/blackboard.py
286 +287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 +311 +312 +313 +314 +315 |
|
requests_to_json()
+
+Convert the data to a dictionary.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
171 +172 +173 +174 +175 +176 |
|
screenshots_to_json()
+
+Convert the images to a dictionary.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
185 +186 +187 +188 +189 +190 |
|
screenshots_to_prompt()
+
+Convert the images to a prompt.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 |
|
texts_to_prompt(memory, prefix)
+
+Convert the data to a prompt.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 |
|
trajectories_to_json()
+
+Convert the data to a dictionary.
+ + +Returns: | +
+
|
+
---|
agents/memory/blackboard.py
178 +179 +180 +181 +182 +183 |
|
Note
+You can customize the class to tailor the Blackboard
to your requirements.
The Memory
manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory
will be visible to the agent for decision-making.
A MemoryItem
is a dataclass
that represents a single step in the agent's memory. The fields of a MemoryItem
is flexible and can be customized based on the requirements of the agent. The MemoryItem
class is defined as follows:
This data class represents a memory item of an agent at one step.
+ + + + + + + + + +attributes: List[str]
+
+
+ property
+
+
+Get the attributes of the memory item.
+ + +Returns: | +
+
|
+
---|
add_values_from_dict(values)
+
+Add fields to the memory item.
+ + +Parameters: | +
+
|
+
---|
agents/memory/memory.py
57 +58 +59 +60 +61 +62 +63 |
|
filter(keys=[])
+
+Fetch the memory item.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/memory.py
37 +38 +39 +40 +41 +42 +43 +44 |
|
get_value(key)
+
+Get the value of the field.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/memory.py
65 +66 +67 +68 +69 +70 +71 +72 |
|
get_values(keys)
+
+Get the values of the fields.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/memory.py
74 +75 +76 +77 +78 +79 +80 |
|
set_value(key, value)
+
+Add a field to the memory item.
+ + +Parameters: | +
+
|
+
---|
agents/memory/memory.py
46 +47 +48 +49 +50 +51 +52 +53 +54 +55 |
|
to_dict()
+
+Convert the MemoryItem to a dictionary.
+ + +Returns: | +
+
|
+
---|
agents/memory/memory.py
19 +20 +21 +22 +23 +24 +25 +26 +27 +28 |
|
to_json()
+
+Convert the memory item to a JSON string.
+ + +Returns: | +
+
|
+
---|
agents/memory/memory.py
30 +31 +32 +33 +34 +35 |
|
Info
+At each step, an instance of MemoryItem
is created and stored in the Memory
to record the information of the agent's interaction with the user and applications.
The Memory
class is responsible for managing the memory of the agent. It stores a list of MemoryItem
instances that represent the agent's memory at each step. The Memory
class is defined as follows:
This data class represents a memory of an agent.
+ + + + + + + + + +content: List[MemoryItem]
+
+
+ property
+
+
+Get the content of the memory.
+ + +Returns: | +
+
|
+
---|
length: int
+
+
+ property
+
+
+Get the length of the memory.
+ + +Returns: | +
+
|
+
---|
list_content: List[Dict[str, str]]
+
+
+ property
+
+
+List the content of the memory.
+ + +Returns: | +
+
|
+
---|
add_memory_item(memory_item)
+
+Add a memory item to the memory.
+ + +Parameters: | +
+
|
+
---|
agents/memory/memory.py
122 +123 +124 +125 +126 +127 |
|
clear()
+
+Clear the memory.
+ +agents/memory/memory.py
129 +130 +131 +132 +133 |
|
delete_memory_item(step)
+
+Delete a memory item from the memory.
+ + +Parameters: | +
+
|
+
---|
agents/memory/memory.py
143 +144 +145 +146 +147 +148 |
|
filter_memory_from_keys(keys)
+
+Filter the memory from the keys. If an item does not have the key, the key will be ignored.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/memory.py
114 +115 +116 +117 +118 +119 +120 |
|
filter_memory_from_steps(steps)
+
+Filter the memory from the steps.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/memory/memory.py
106 +107 +108 +109 +110 +111 +112 |
|
get_latest_item()
+
+Get the latest memory item.
+ + +Returns: | +
+
|
+
---|
agents/memory/memory.py
160 +161 +162 +163 +164 +165 +166 +167 |
|
is_empty()
+
+Check if the memory is empty.
+ + +Returns: | +
+
|
+
---|
agents/memory/memory.py
185 +186 +187 +188 +189 +190 |
|
load(content)
+
+Load the data from the memory.
+ + +Parameters: | +
+
|
+
---|
agents/memory/memory.py
99 +100 +101 +102 +103 +104 |
|
to_json()
+
+Convert the memory to a JSON string.
+ + +Returns: | +
+
|
+
---|
agents/memory/memory.py
150 +151 +152 +153 +154 +155 +156 +157 +158 |
|
Info
+Each agent has its own Memory
instance to store their information.
Info
+Not all information in the Memory
are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.
The Processor
is a key component of the agent to process the core logic of the agent to process the user's request. The Processor
is implemented as a class in the ufo/agents/processors
folder. Each agent has its own Processor
class withing the folder.
Once called, an agent follows a series of steps to process the user's request defined in the Processor
class by calling the process
method. The workflow of the process
is as follows:
Step | +Description | +Function | +
---|---|---|
1 | +Print the step information. | +print_step_info |
+
2 | +Capture the screenshot of the application. | +capture_screenshot |
+
3 | +Get the control information of the application. | +get_control_info |
+
4 | +Get the prompt message for the LLM. | +get_prompt_message |
+
5 | +Generate the response from the LLM. | +get_response |
+
6 | +Update the cost of the step. | +update_cost |
+
7 | +Parse the response from the LLM. | +parse_response |
+
8 | +Execute the action based on the response. | +execute_action |
+
9 | +Update the memory and blackboard. | +update_memory |
+
10 | +Update the status of the agent. | +update_status |
+
At each step, the Processor
processes the user's request by invoking the corresponding method sequentially to execute the necessary actions.
The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume
method.
Below is the basic structure of the Processor
class:
+
+ Bases: ABC
The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. +At each round, the HostAgent and AppAgent interact with the user and the application with the processor. +Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round.
+ +Initialize the processor.
+ + +Parameters: | +
+
|
+
---|
agents/processors/basic.py
35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 |
|
action: str
+
+
+ property
+ writable
+
+
+Get the action.
+ + +Returns: | +
+
|
+
---|
agent: BasicAgent
+
+
+ property
+
+
+Get the agent.
+ + +Returns: | +
+
|
+
---|
app_root: str
+
+
+ property
+ writable
+
+
+Get the application root.
+ + +Returns: | +
+
|
+
---|
application_process_name: str
+
+
+ property
+ writable
+
+
+Get the application process name.
+ + +Returns: | +
+
|
+
---|
application_window: UIAWrapper
+
+
+ property
+ writable
+
+
+Get the active window.
+ + +Returns: | +
+
|
+
---|
context: Context
+
+
+ property
+
+
+Get the context.
+ + +Returns: | +
+
|
+
---|
control_label: str
+
+
+ property
+ writable
+
+
+Get the control label.
+ + +Returns: | +
+
|
+
---|
control_reannotate: List[str]
+
+
+ property
+ writable
+
+
+Get the control reannotation.
+ + +Returns: | +
+
|
+
---|
control_text: str
+
+
+ property
+ writable
+
+
+Get the active application.
+ + +Returns: | +
+
|
+
---|
cost: float
+
+
+ property
+ writable
+
+
+Get the cost of the processor.
+ + +Returns: | +
+
|
+
---|
host_message: List[str]
+
+
+ property
+ writable
+
+
+Get the host message.
+ + +Returns: | +
+
|
+
---|
log_path: str
+
+
+ property
+
+
+Get the log path.
+ + +Returns: | +
+
|
+
---|
logger: str
+
+
+ property
+
+
+Get the logger.
+ + +Returns: | +
+
|
+
---|
name: str
+
+
+ property
+
+
+Get the name of the processor.
+ + +Returns: | +
+
|
+
---|
plan: str
+
+
+ property
+ writable
+
+
+Get the plan of the agent.
+ + +Returns: | +
+
|
+
---|
prev_plan: List[str]
+
+
+ property
+
+
+Get the previous plan.
+ + +Returns: | +
+
|
+
---|
previous_subtasks: List[str]
+
+
+ property
+ writable
+
+
+Get the previous subtasks.
+ + +Returns: | +
+
|
+
---|
question_list: List[str]
+
+
+ property
+ writable
+
+
+Get the question list.
+ + +Returns: | +
+
|
+
---|
request: str
+
+
+ property
+
+
+Get the request.
+ + +Returns: | +
+
|
+
---|
request_logger: str
+
+
+ property
+
+
+Get the request logger.
+ + +Returns: | +
+
|
+
---|
round_cost: float
+
+
+ property
+ writable
+
+
+Get the round cost.
+ + +Returns: | +
+
|
+
---|
round_num: int
+
+
+ property
+
+
+Get the round number.
+ + +Returns: | +
+
|
+
---|
round_step: int
+
+
+ property
+ writable
+
+
+Get the round step.
+ + +Returns: | +
+
|
+
---|
round_subtask_amount: int
+
+
+ property
+
+
+Get the round subtask amount.
+ + +Returns: | +
+
|
+
---|
session_cost: float
+
+
+ property
+ writable
+
+
+Get the session cost.
+ + +Returns: | +
+
|
+
---|
session_step: int
+
+
+ property
+ writable
+
+
+Get the session step.
+ + +Returns: | +
+
|
+
---|
status: str
+
+
+ property
+ writable
+
+
+Get the status of the processor.
+ + +Returns: | +
+
|
+
---|
subtask: str
+
+
+ property
+ writable
+
+
+Get the subtask.
+ + +Returns: | +
+
|
+
---|
ui_tree_path: str
+
+
+ property
+
+
+Get the UI tree path.
+ + +Returns: | +
+
|
+
---|
add_to_memory(data_dict)
+
+Add the data to the memory.
+ + +Parameters: | +
+
|
+
---|
agents/processors/basic.py
297 +298 +299 +300 +301 +302 |
|
capture_screenshot()
+
+
+ abstractmethod
+
+
+Capture the screenshot.
+ +agents/processors/basic.py
235 +236 +237 +238 +239 +240 |
|
exception_capture(func)
+
+
+ classmethod
+
+
+Decorator to capture the exception of the method.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/processors/basic.py
185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 |
|
execute_action()
+
+
+ abstractmethod
+
+
+Execute the action.
+ +agents/processors/basic.py
270 +271 +272 +273 +274 +275 |
|
get_control_info()
+
+
+ abstractmethod
+
+
+Get the control information.
+ +agents/processors/basic.py
242 +243 +244 +245 +246 +247 |
|
get_prompt_message()
+
+
+ abstractmethod
+
+
+Get the prompt message.
+ +agents/processors/basic.py
249 +250 +251 +252 +253 +254 |
|
get_response()
+
+
+ abstractmethod
+
+
+Get the response from the LLM.
+ +agents/processors/basic.py
256 +257 +258 +259 +260 +261 |
|
is_confirm()
+
+Check if the process is confirm.
+ + +Returns: | +
+
|
+
---|
agents/processors/basic.py
736 +737 +738 +739 +740 +741 +742 +743 +744 |
|
is_error()
+
+Check if the process is in error.
+ + +Returns: | +
+
|
+
---|
agents/processors/basic.py
704 +705 +706 +707 +708 +709 +710 +711 |
|
is_paused()
+
+Check if the process is paused.
+ + +Returns: | +
+
|
+
---|
agents/processors/basic.py
713 +714 +715 +716 +717 +718 +719 +720 +721 +722 +723 +724 |
|
is_pending()
+
+Check if the process is pending.
+ + +Returns: | +
+
|
+
---|
agents/processors/basic.py
726 +727 +728 +729 +730 +731 +732 +733 +734 |
|
log(response_json)
+
+Set the result of the session, and log the result. +result: The result of the session. +response_json: The response json. +return: The response json.
+ +agents/processors/basic.py
746 +747 +748 +749 +750 +751 +752 +753 +754 |
|
log_save()
+
+Save the log.
+ +agents/processors/basic.py
304 +305 +306 +307 +308 +309 +310 +311 +312 |
|
method_timer(func)
+
+
+ classmethod
+
+
+Decorator to calculate the time cost of the method.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/processors/basic.py
167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 |
|
parse_response()
+
+
+ abstractmethod
+
+
+Parse the response.
+ +agents/processors/basic.py
263 +264 +265 +266 +267 +268 |
|
print_step_info()
+
+
+ abstractmethod
+
+
+Print the step information.
+ +agents/processors/basic.py
228 +229 +230 +231 +232 +233 |
|
process()
+
+Process a single step in a round. +The process includes the following steps: +1. Print the step information. +2. Capture the screenshot. +3. Get the control information. +4. Get the prompt message. +5. Get the response. +6. Update the cost. +7. Parse the response. +8. Execute the action. +9. Update the memory. +10. Update the step and status. +11. Save the log.
+ +agents/processors/basic.py
73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 |
|
resume()
+
+Resume the process of action execution after the session is paused.
+ +agents/processors/basic.py
142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 |
|
string2list(string)
+
+
+ staticmethod
+
+
+Convert a string to a list of string if the input is a string.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/processors/basic.py
764 +765 +766 +767 +768 +769 +770 +771 +772 +773 +774 |
|
sync_memory()
+
+
+ abstractmethod
+
+
+Sync the memory of the Agent.
+ +agents/processors/basic.py
221 +222 +223 +224 +225 +226 |
|
update_cost()
+
+Update the cost.
+ +agents/processors/basic.py
322 +323 +324 +325 +326 +327 +328 |
|
update_memory()
+
+
+ abstractmethod
+
+
+Update the memory of the Agent.
+ +agents/processors/basic.py
277 +278 +279 +280 +281 +282 |
|
update_status()
+
+Update the status of the session.
+ +agents/processors/basic.py
284 +285 +286 +287 +288 +289 +290 +291 +292 +293 +294 +295 |
|
The Prompter
is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter
is implemented in the ufo/prompts
folder. Each agent has its own Prompter
class that defines the structure of the prompt and the information to be fed to the LLM.
A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys:
+Key | +Description | +
---|---|
role |
+The role of the text in the prompt, can be system , user , or assistant . |
+
content |
+The content of the text for the specific role. | +
Tip
+You may find the official documentation helpful for constructing the prompt.
+In the __init__
method of the Prompter
class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction
method.
The system prompt use the template configured in the config_dev.yaml
file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc.
+You need use the system_prompt_construction
method to construct the system prompt.
Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper
and examples_prompt_helper
methods respectively. Below is the sub-components of the system prompt:
Component | +Description | +Method | +
---|---|---|
apis |
+The API instructions for the agent. | +api_prompt_helper |
+
examples |
+The demonstration examples for the agent. | +examples_prompt_helper |
+
The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard
. You can use the user_prompt_construction
method to construct the user prompt. Below is the sub-components of the user prompt:
Component | +Description | +Method | +
---|---|---|
observation |
+The observation of the agent. | +user_content_construction |
+
retrieved_docs |
+The knowledge retrieved from the external knowledge base. | +retrived_documents_prompt_helper |
+
blackboard |
+The information stored in the Blackboard . |
+blackboard_to_prompt |
+
You can find the implementation of the Prompter
in the ufo/prompts
folder. Below is the basic structure of the Prompter
class:
+ Bases: ABC
The BasicPrompter class is the abstract class for the prompter.
+ +Initialize the BasicPrompter.
+ + +Parameters: | +
+
|
+
---|
prompter/basic.py
18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 |
|
api_prompt_helper()
+
+A helper function to construct the API list and descriptions for the prompt.
+ +prompter/basic.py
139 +140 +141 +142 +143 +144 |
|
examples_prompt_helper()
+
+A helper function to construct the examples prompt for in-context learning.
+ +prompter/basic.py
132 +133 +134 +135 +136 +137 |
|
load_prompt_template(template_path, is_visual=None)
+
+
+ staticmethod
+
+
+Load the prompt template.
+ + +Returns: | +
+
|
+
---|
prompter/basic.py
39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 |
|
prompt_construction(system_prompt, user_content)
+
+
+ staticmethod
+
+
+Construct the prompt for summarizing the experience into an example.
+ + +Parameters: | +
+
|
+
---|
prompter/basic.py
66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 |
|
retrived_documents_prompt_helper(header, separator, documents)
+
+
+ staticmethod
+
+
+Construct the prompt for retrieved documents.
+ + +Parameters: | +
+
|
+
---|
prompter/basic.py
84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 |
|
system_prompt_construction()
+
+
+ abstractmethod
+
+
+Construct the system prompt for LLM.
+ +prompter/basic.py
108 +109 +110 +111 +112 +113 +114 |
|
user_content_construction()
+
+
+ abstractmethod
+
+
+Construct the full user content for LLM, including the user prompt and images.
+ +prompter/basic.py
124 +125 +126 +127 +128 +129 +130 |
|
user_prompt_construction()
+
+
+ abstractmethod
+
+
+Construct the textual user prompt for LLM based on the user
field in the prompt template.
prompter/basic.py
116 +117 +118 +119 +120 +121 +122 |
|
Tip
+You can customize the Prompter
class to tailor the prompt to your requirements.
The State
class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow.
The set of states for an agent is defined in the AgentStatus
class:
class AgentStatus(Enum):
+ """
+ The status class for the agent.
+ """
+
+ ERROR = "ERROR"
+ FINISH = "FINISH"
+ CONTINUE = "CONTINUE"
+ FAIL = "FAIL"
+ PENDING = "PENDING"
+ CONFIRM = "CONFIRM"
+ SCREENSHOT = "SCREENSHOT"
+
+Each agent implements its own set of AgentStatus
to define the states of the agent.
The class AgentStateManager
manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager
using the register
decorator to associate the state class with a specific agent, e.g.,
@AgentStateManager.register
+class SomeAgentState(AgentState):
+ """
+ The state class for the some agent.
+ """
+
+Tip
+You can find examples on how to register the state class for the AppAgent
in the ufo/agents/states/app_agent_state.py
file.
Below is the basic structure of the AgentStateManager
class:
class AgentStateManager(ABC, metaclass=SingletonABCMeta):
+ """
+ A abstract class to manage the states of the agent.
+ """
+
+ _state_mapping: Dict[str, Type[AgentState]] = {}
+
+ def __init__(self):
+ """
+ Initialize the state manager.
+ """
+
+ self._state_instance_mapping: Dict[str, AgentState] = {}
+
+ def get_state(self, status: str) -> AgentState:
+ """
+ Get the state for the status.
+ :param status: The status string.
+ :return: The state object.
+ """
+
+ # Lazy load the state class
+ if status not in self._state_instance_mapping:
+ state_class = self._state_mapping.get(status)
+ if state_class:
+ self._state_instance_mapping[status] = state_class()
+ else:
+ self._state_instance_mapping[status] = self.none_state
+
+ state = self._state_instance_mapping.get(status, self.none_state)
+
+ return state
+
+ def add_state(self, status: str, state: AgentState) -> None:
+ """
+ Add a new state to the state mapping.
+ :param status: The status string.
+ :param state: The state object.
+ """
+ self.state_map[status] = state
+
+ @property
+ def state_map(self) -> Dict[str, AgentState]:
+ """
+ The state mapping of status to state.
+ :return: The state mapping.
+ """
+ return self._state_instance_mapping
+
+ @classmethod
+ def register(cls, state_class: Type[AgentState]) -> Type[AgentState]:
+ """
+ Decorator to register the state class to the state manager.
+ :param state_class: The state class to be registered.
+ :return: The state class.
+ """
+ cls._state_mapping[state_class.name()] = state_class
+ return state_class
+
+ @property
+ @abstractmethod
+ def none_state(self) -> AgentState:
+ """
+ The none state of the state manager.
+ """
+ pass
+
+Each state class inherits from the AgentState
class and must implement the method of handle
to process the action in the state. In addition, the next_state
and next_agent
methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State
class in UFO.
+ Bases: ABC
The abstract class for the agent state.
+ + + + + + + + + +agent_class()
+
+
+ abstractmethod
+ classmethod
+
+
+The class of the agent.
+ + +Returns: | +
+
|
+
---|
agents/states/basic.py
165 +166 +167 +168 +169 +170 +171 +172 |
|
handle(agent, context=None)
+
+
+ abstractmethod
+
+
+Handle the agent for the current step.
+ + +Parameters: | +
+
|
+
---|
agents/states/basic.py
122 +123 +124 +125 +126 +127 +128 +129 |
|
is_round_end()
+
+
+ abstractmethod
+
+
+Check if the round ends.
+ + +Returns: | +
+
|
+
---|
agents/states/basic.py
149 +150 +151 +152 +153 +154 +155 |
|
is_subtask_end()
+
+
+ abstractmethod
+
+
+Check if the subtask ends.
+ + +Returns: | +
+
|
+
---|
agents/states/basic.py
157 +158 +159 +160 +161 +162 +163 |
|
name()
+
+
+ abstractmethod
+ classmethod
+
+
+The class name of the state.
+ + +Returns: | +
+
|
+
---|
agents/states/basic.py
174 +175 +176 +177 +178 +179 +180 +181 |
|
next_agent(agent)
+
+
+ abstractmethod
+
+
+Get the agent for the next step.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/states/basic.py
131 +132 +133 +134 +135 +136 +137 +138 |
|
next_state(agent)
+
+
+ abstractmethod
+
+
+Get the state for the next step.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/states/basic.py
140 +141 +142 +143 +144 +145 +146 +147 |
|
Tip
+The state machine diagrams for the HostAgent
and AppAgent
are shown in their respective documents.
Tip
+A Round
calls the handle
, next_state
, and next_agent
methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.
The objective of the EvaluationAgent
is to evaluate whether a Session
or Round
has been successfully completed. The EvaluationAgent
assesses the performance of the HostAgent
and AppAgent
in fulfilling the request. You can configure whether to enable the EvaluationAgent
in the config_dev.yaml
file and the detailed documentation can be found here.
Note
+The EvaluationAgent
is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes.
To enable the EvaluationAgent
, you can configure the following parameters in the config_dev.yaml
file to evaluate the task completion status at different levels:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
EVA_SESSION |
+Whether to include the session in the evaluation. | +Boolean | +True | +
EVA_ROUND |
+Whether to include the round in the evaluation. | +Boolean | +False | +
EVA_ALL_SCREENSHOTS |
+Whether to include all the screenshots in the evaluation. | +Boolean | +True | +
The EvaluationAgent
takes the following inputs for evaluation:
Input | +Description | +Type | +
---|---|---|
User Request | +The user's request to be evaluated. | +String | +
APIs Description | +The description of the APIs used in the execution. | +List of Strings | +
Action Trajectories | +The action trajectories executed by the HostAgent and AppAgent . |
+List of Strings | +
Screenshots | +The screenshots captured during the execution. | +List of Images | +
For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter
class in ufo/prompter/eva_prompter.py
.
Tip
+You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS
of the config_dev.yaml
file.
The EvaluationAgent
generates the following outputs after evaluation:
Output | +Description | +Type | +
---|---|---|
reason | +The detailed reason for your judgment, by observing the screenshot differences and the |
+String | +
sub_scores | +The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | +List of Dictionaries | +
complete | +The completion status of the evaluation, can be yes , no , or unsure . |
+String | +
Below is an example of the evaluation output:
+{
+ "reason": "The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams.
+ The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open.
+ The agent then focused on the chat window, input the message 'hello', and clicked the Send button.
+ The final screenshot confirms that the message 'hello' was sent to Zac.",
+ "sub_scores": {
+ "correct application focus": "yes",
+ "correct message input": "yes",
+ "message sent successfully": "yes"
+ },
+ "complete": "yes"}
+
+Info
+The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log
file.
The EvaluationAgent
employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation.
+ Bases: BasicAgent
The agent for evaluation.
+ +Initialize the FollowAgent. +:agent_type: The type of the agent. +:is_visual: The flag indicating whether the agent is visual or not.
+ + + + + + +agents/agent/evaluation_agent.py
27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 |
|
status_manager: EvaluatonAgentStatus
+
+
+ property
+
+
+Get the status manager.
+evaluate(request, log_path, eva_all_screenshots=True)
+
+Evaluate the task completion.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/evaluation_agent.py
104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 |
|
get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None)
+
+Get the prompter for the agent.
+ +agents/agent/evaluation_agent.py
53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 |
|
message_constructor(log_path, request, eva_all_screenshots=True)
+
+Construct the message.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/evaluation_agent.py
73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 +93 +94 |
|
print_response(response_dict)
+
+Print the response of the evaluation.
+ + +Parameters: | +
+
|
+
---|
agents/agent/evaluation_agent.py
130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 |
|
process_comfirmation()
+
+Comfirmation, currently do nothing.
+ +agents/agent/evaluation_agent.py
124 +125 +126 +127 +128 |
|
The FollowerAgent
is inherited from the AppAgent
and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent
is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior.
The FollowerAgent
shares most of the functionalities with the AppAgent
, but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action.
The FollowerAgent
is available in follower
mode. You can find more details in the documentation. It also uses differnt Session
and Processor
to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent
to execute the actions. An example of the json file is shown below:
{
+ "task": "Type in a bold text of 'Test For Fun'",
+ "steps":
+ [
+ "1.type in 'Test For Fun'",
+ "2.select the text of 'Test For Fun'",
+ "3.click on the bold"
+ ],
+ "object": "draft.docx"
+}
+
+
+ Bases: AppAgent
The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. +It is a subclass of the AppAgent, which completes the action execution within the application.
+ +Initialize the FollowAgent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/follower_agent.py
21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 |
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_info_prompt, app_root_name='')
+
+Get the prompter for the follower agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/follower_agent.py
63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 |
|
message_constructor(dynamic_examples, dynamic_tips, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, host_message, current_state, state_diff, include_last_screenshot)
+
+Construct the prompt message for the FollowAgent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/follower_agent.py
91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 |
|
The HostAgent
assumes three primary responsibilities:
HostAgent
engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary.HostAgent
manages the creation and registration of AppAgents
to fulfill the user's request. It also orchestrates the interaction between the AppAgents
and the application.HostAgent
analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents
. It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents
to ensure the successful completion of the user's request.HostAgent
can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents
' execution.HostAgent
communicates with the AppAgents
to exchange information. It also manages the Blackboard
to store and share information among the agents, as shown below:The HostAgent
activates its Processor
to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent
for execution. The HostAgent
monitors the progress of the AppAgents
and ensures the successful completion of the user's request.
The HostAgent
receives the following inputs:
Input | +Description | +Type | +
---|---|---|
User Request | +The user's request in natural language. | +String | +
Application Information | +Information about the existing active applications. | +List of Strings | +
Desktop Screenshots | +Screenshots of the desktop to provide context to the HostAgent . |
+Image | +
Previous Sub-Tasks | +The previous sub-tasks and their completion status. | +List of Strings | +
Previous Plan | +The previous plan for the following sub-tasks. | +List of Strings | +
Blackboard | +The shared memory space for storing and sharing information among the agents. | +Dictionary | +
By processing these inputs, the HostAgent
determines the appropriate application to fulfill the user's request and orchestrates the AppAgents
to execute the necessary actions.
With the inputs provided, the HostAgent
generates the following outputs:
Output | +Description | +Type | +
---|---|---|
Observation | +The observation of current desktop screenshots. | +String | +
Thought | +The logical reasoning process of the HostAgent . |
+String | +
Current Sub-Task | +The current sub-task to be executed by the AppAgent . |
+String | +
Message | +The message to be sent to the AppAgent for the completion of the sub-task. |
+String | +
ControlLabel | +The index of the selected application to execute the sub-task. | +String | +
ControlText | +The name of the selected application to execute the sub-task. | +String | +
Plan | +The plan for the following sub-tasks after the current sub-task. | +List of Strings | +
Status | +The status of the agent, mapped to the AgentState . |
+String | +
Comment | +Additional comments or information provided to the user. | +String | +
Questions | +The questions to be asked to the user for additional information. | +List of Strings | +
Bash | +The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. |
+String | +
Below is an example of the HostAgent
output:
{
+ "Observation": "Desktop screenshot",
+ "Thought": "Logical reasoning process",
+ "Current Sub-Task": "Sub-task description",
+ "Message": "Message to AppAgent",
+ "ControlLabel": "Application index",
+ "ControlText": "Application name",
+ "Plan": ["Sub-task 1", "Sub-task 2"],
+ "Status": "AgentState",
+ "Comment": "Additional comments",
+ "Questions": ["Question 1", "Question 2"],
+ "Bash": "Bash command"
+}
+
+Info
+The HostAgent
output is formatted as a JSON object by LLMs and can be parsed by the json.loads
method in Python.
The HostAgent
progresses through different states, as defined in the ufo/agents/states/host_agent_states.py
module. The states include:
State | +Description | +
---|---|
CONTINUE |
+The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks. |
+
ASSIGN |
+The HostAgent is assigning the sub-tasks to the AppAgents for execution. |
+
FINISH |
+The overall task is completed, and the HostAgent is ready to return the results to the user. |
+
ERROR |
+An error occurred during the processing of the user's request, and the HostAgent is unable to proceed. |
+
FAIL |
+The HostAgent believes the task is unachievable and cannot proceed further. |
+
PENDING |
+The HostAgent is waiting for additional information from the user to proceed. |
+
The state machine diagram for the HostAgent
is shown below:
The HostAgent
transitions between these states based on the user's request, the application information, and the progress of the AppAgents
in executing the sub-tasks.
Upon receiving the user's request, the HostAgent
decomposes it into sub-tasks and assigns each sub-task to an AppAgent
for execution. The HostAgent
determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents
to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:
When the HostAgent
determines the need for a new AppAgent
to fulfill a sub-task, it creates an instance of the AppAgent
and registers it with the HostAgent
, by calling the create_subagent
method:
def create_subagent(
+ self,
+ agent_type: str,
+ agent_name: str,
+ process_name: str,
+ app_root_name: str,
+ is_visual: bool,
+ main_prompt: str,
+ example_prompt: str,
+ api_prompt: str,
+ *args,
+ **kwargs,
+ ) -> BasicAgent:
+ """
+ Create an SubAgent hosted by the HostAgent.
+ :param agent_type: The type of the agent to create.
+ :param agent_name: The name of the SubAgent.
+ :param process_name: The process name of the app.
+ :param app_root_name: The root name of the app.
+ :param is_visual: The flag indicating whether the agent is visual or not.
+ :param main_prompt: The main prompt file path.
+ :param example_prompt: The example prompt file path.
+ :param api_prompt: The API prompt file path.
+ :return: The created SubAgent.
+ """
+ app_agent = self.agent_factory.create_agent(
+ agent_type,
+ agent_name,
+ process_name,
+ app_root_name,
+ is_visual,
+ main_prompt,
+ example_prompt,
+ api_prompt,
+ *args,
+ **kwargs,
+ )
+ self.appagent_dict[agent_name] = app_agent
+ app_agent.host = self
+ self._active_appagent = app_agent
+
+ return app_agent
+
+The HostAgent
then assigns the sub-task to the AppAgent
for execution and monitors its progress.
+ Bases: BasicAgent
The HostAgent class the manager of AppAgents.
+ +Initialize the HostAgent. +:name: The name of the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/host_agent.py
51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 |
|
blackboard
+
+
+ property
+
+
+Get the blackboard.
+status_manager: HostAgentStatus
+
+
+ property
+
+
+Get the status manager.
+sub_agent_amount: int
+
+
+ property
+
+
+Get the amount of sub agents.
+ + +Returns: | +
+
|
+
---|
create_app_agent(application_window_name, application_root_name, request, mode)
+
+Create the app agent for the host agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/host_agent.py
220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 |
|
create_puppeteer_interface()
+
+Create the Puppeteer interface to automate the app.
+ + +Returns: | +
+
|
+
---|
agents/agent/host_agent.py
213 +214 +215 +216 +217 +218 |
|
create_subagent(agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs)
+
+Create an SubAgent hosted by the HostAgent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/host_agent.py
99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 |
|
get_active_appagent()
+
+Get the active app agent.
+ + +Returns: | +
+
|
+
---|
agents/agent/host_agent.py
150 +151 +152 +153 +154 +155 |
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+Get the prompt for the agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/host_agent.py
82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 +93 +94 +95 +96 +97 |
|
message_constructor(image_list, os_info, plan, prev_subtask, request)
+
+Construct the message.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/host_agent.py
164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 |
|
print_response(response_dict)
+
+Print the response.
+ + +Parameters: | +
+
|
+
---|
agents/agent/host_agent.py
295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 +311 +312 +313 +314 +315 +316 +317 +318 +319 +320 +321 +322 +323 +324 +325 +326 +327 +328 +329 +330 +331 +332 +333 +334 +335 +336 +337 +338 +339 +340 +341 +342 +343 +344 |
|
process(context)
+
+Process the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/host_agent.py
202 +203 +204 +205 +206 +207 +208 +209 +210 +211 |
|
process_comfirmation()
+
+TODO: Process the confirmation.
+ +agents/agent/host_agent.py
289 +290 +291 +292 +293 |
|
In UFO, there are four types of agents: HostAgent
, AppAgent
, FollowerAgent
, and EvaluationAgent
. Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process:
Agent | +Description | +
---|---|
HostAgent |
+Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. | +
AppAgent |
+Executes actions on the selected application. | +
FollowerAgent |
+Follows the user's instructions to complete the task. | +
EvaluationAgent |
+Evaluates the completeness of a session or a round. | +
In the normal workflow, only the HostAgent
and AppAgent
are involved in the user interaction process. The FollowerAgent
and EvaluationAgent
are used for specific tasks.
Please see below the orchestration of the agents in UFO:
+An agent in UFO is composed of the following main components to fulfill its role in the UFO system:
+Component | +Description | +
---|---|
State |
+Represents the current state of the agent and determines the next action and agent to handle the request. | +
Memory |
+Stores information about the user request, application state, and other relevant data. | +
Blackboard |
+Stores information shared between agents. | +
Prompter |
+Generates prompts for the language model based on the user request and application state. | +
Processor |
+Processes the workflow of the agent, including handling user requests, executing actions, and memory management. | +
Below is the reference for the Agent
class in UFO. All agents in UFO inherit from the Agent
class and implement necessary methods to fulfill their roles in the UFO system.
+ Bases: ABC
The BasicAgent class is the abstract class for the agent.
+ +Initialize the BasicAgent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 |
|
blackboard: Blackboard
+
+
+ property
+
+
+Get the blackboard.
+ + +Returns: | +
+
|
+
---|
host: HostAgent
+
+
+ property
+ writable
+
+
+Get the host of the agent.
+ + +Returns: | +
+
|
+
---|
memory: Memory
+
+
+ property
+
+
+Get the memory of the agent.
+ + +Returns: | +
+
|
+
---|
name: str
+
+
+ property
+
+
+Get the name of the agent.
+ + +Returns: | +
+
|
+
---|
processor: BaseProcessor
+
+
+ property
+ writable
+
+
+Get the processor.
+ + +Returns: | +
+
|
+
---|
state: AgentState
+
+
+ property
+
+
+Get the state of the agent.
+ + +Returns: | +
+
|
+
---|
status: str
+
+
+ property
+ writable
+
+
+Get the status of the agent.
+ + +Returns: | +
+
|
+
---|
status_manager: AgentStatus
+
+
+ property
+
+
+Get the status manager.
+ + +Returns: | +
+
|
+
---|
step: int
+
+
+ property
+ writable
+
+
+Get the step of the agent.
+ + +Returns: | +
+
|
+
---|
add_memory(memory_item)
+
+Update the memory of the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
181 +182 +183 +184 +185 +186 |
|
build_experience_retriever()
+
+Build the experience retriever.
+ +agents/agent/basic.py
323 +324 +325 +326 +327 |
|
build_human_demonstration_retriever()
+
+Build the human demonstration retriever.
+ +agents/agent/basic.py
329 +330 +331 +332 +333 |
|
build_offline_docs_retriever()
+
+Build the offline docs retriever.
+ +agents/agent/basic.py
311 +312 +313 +314 +315 |
|
build_online_search_retriever()
+
+Build the online search retriever.
+ +agents/agent/basic.py
317 +318 +319 +320 +321 |
|
clear_memory()
+
+Clear the memory of the agent.
+ +agents/agent/basic.py
195 +196 +197 +198 +199 |
|
create_puppeteer_interface()
+
+Create the puppeteer interface.
+ +agents/agent/basic.py
233 +234 +235 +236 +237 |
|
delete_memory(step)
+
+Delete the memory of the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
188 +189 +190 +191 +192 +193 |
|
get_cls(name)
+
+
+ classmethod
+
+
+Retrieves an agent class from the registry.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/basic.py
350 +351 +352 +353 +354 +355 +356 +357 |
|
get_prompter()
+
+
+ abstractmethod
+
+
+Get the prompt for the agent.
+ + +Returns: | +
+
|
+
---|
agents/agent/basic.py
124 +125 +126 +127 +128 +129 +130 |
|
get_response(message, namescope, use_backup_engine, configs=configs)
+
+
+ classmethod
+
+
+Get the response for the prompt.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/basic.py
140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 |
|
handle(context)
+
+Handle the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
220 +221 +222 +223 +224 +225 |
|
message_constructor()
+
+
+ abstractmethod
+
+
+Construct the message.
+ + +Returns: | +
+
|
+
---|
agents/agent/basic.py
132 +133 +134 +135 +136 +137 +138 |
|
print_response()
+
+Print the response.
+ +agents/agent/basic.py
335 +336 +337 +338 +339 |
|
process(context)
+
+Process the agent.
+ +agents/agent/basic.py
227 +228 +229 +230 +231 |
|
process_asker(ask_user=True)
+
+Ask for the process.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 |
|
process_comfirmation()
+
+
+ abstractmethod
+
+
+Confirm the process.
+ +agents/agent/basic.py
280 +281 +282 +283 +284 +285 |
|
process_resume()
+
+Resume the process.
+ +agents/agent/basic.py
239 +240 +241 +242 +243 +244 |
|
reflection()
+
+TODO: +Reflect on the action.
+ +agents/agent/basic.py
201 +202 +203 +204 +205 +206 |
|
response_to_dict(response)
+
+
+ staticmethod
+
+
+Convert the response to a dictionary.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
agents/agent/basic.py
156 +157 +158 +159 +160 +161 +162 +163 |
|
set_state(state)
+
+Set the state of the agent.
+ + +Parameters: | +
+
|
+
---|
agents/agent/basic.py
208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 |
|
The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks.
+Note
+UFO can also call in-app AI tools, such as Copilot
, to assist with the automation process. This is achieved by using either UI Automation
or API
to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application.
The AI Tool Automator shares the same prompt configuration options as the UI Automator:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
API_PROMPT |
+The prompt for the UI automation API. | +String | +"ufo/prompts/share/base/api.yaml" | +
The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details.
+The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below:
+Command Name | +Function Name | +Description | +
---|---|---|
AnnotationCommand |
+annotation |
+Annotate the control items on the screenshot. | +
SummaryCommand |
+summary |
+Summarize the observation of the current application window. | +
UFO allows the HostAgent
to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator
is implemented in the ufo/automator/app_apis/shell
module.
Note
+Only HostAgent
is currently supported by the Bash Automator.
The Web Automator receiver is the ShellReceiver
class defined in the ufo/automator/app_apis/shell/shell_client.py
file.
+ Bases: ReceiverBasic
The base class for Web COM client using crawl4ai.
+ +Initialize the shell client.
+ + + + + + +automator/app_apis/shell/shell_client.py
19 +20 +21 +22 |
|
run_shell(params)
+
+Run the command.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/app_apis/shell/shell_client.py
24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 |
|
We now only support one command in the Bash Automator to execute a bash command on the host machine.
+@ShellReceiver.register
+class RunShellCommand(ShellCommand):
+ """
+ The command to run the crawler with various options.
+ """
+
+ def execute(self):
+ """
+ Execute the command to run the crawler.
+ :return: The result content.
+ """
+ return self.receiver.run_shell(params=self.params)
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ The name of the command.
+ """
+ return "run_shell"
+
+Below is the list of available commands in the Web Automator that are currently supported by UFO:
+Command Name | +Function Name | +Description | +
---|---|---|
RunShellCommand |
+run_shell |
+Get the content of a web page into a markdown format. | +
The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation
and API
.
Note
+UFO can also call in-app AI tools, such as Copilot
, to assist with the automation process. This is achieved by using either UI Automation
or API
to interact with the in-app AI tool.
Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action.
+The basic classes for implementing actions in UFO are as follows:
+Role | +Class | +Description | +
---|---|---|
Receiver | +ufo.automator.basic.ReceiverBasic |
+The base class for all receivers in UFO. Receivers are objects that perform actions on applications. | +
Command | +ufo.automator.basic.CommandBasic |
+The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. | +
Invoker | +ufo.automator.puppeteer.AppPuppeteer |
+The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. | +
The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions.
+The Receiver
is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager
class.
You can find the reference for a basic Receiver
class below:
+ Bases: ABC
The abstract receiver interface.
+ + + + + + + + + +command_registry: Dict[str, Type[CommandBasic]]
+
+
+ property
+
+
+Get the command registry.
+supported_command_names: List[str]
+
+
+ property
+
+
+Get the command name list.
+register(command_class)
+
+
+ classmethod
+
+
+Decorator to register the state class to the state manager.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/basic.py
46 +47 +48 +49 +50 +51 +52 +53 +54 |
|
register_command(command_name, command)
+
+Add to the command registry.
+ + +Parameters: | +
+
|
+
---|
automator/basic.py
24 +25 +26 +27 +28 +29 +30 +31 |
|
self_command_mapping()
+
+Get the command-receiver mapping.
+ +automator/basic.py
40 +41 +42 +43 +44 |
|
The Command
is a specific action that the Receiver
can perform on the application. It encapsulates the function and parameters required to execute the action. The Command
class is a base class for all commands in the Automator application.
You can find the reference for a basic Command
class below:
+ Bases: ABC
The abstract command interface.
+ +Initialize the command.
+ + +Parameters: | +
+
|
+
---|
automator/basic.py
67 +68 +69 +70 +71 +72 +73 |
|
execute()
+
+
+ abstractmethod
+
+
+Execute the command.
+ +automator/basic.py
75 +76 +77 +78 +79 +80 |
|
redo()
+
+Redo the command.
+ +automator/basic.py
88 +89 +90 +91 +92 |
|
undo()
+
+Undo the command.
+ +automator/basic.py
82 +83 +84 +85 +86 |
|
Note
+Each command must register with a specific Receiver
to be executed using the register_command
decorator. For example:
+ @ReceiverExample.register
+ class CommandExample(CommandBasic):
+ ...
The AppPuppeteer
plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer
equips the AppAgent
with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer
with the ReceiverManager
class.
You can find the implementation of the AppPuppeteer
class in the ufo/automator/puppeteer.py
file, and its reference is shown below.
The class for the app puppeteer to automate the app in the Windows environment.
+ +Initialize the app puppeteer.
+ + +Parameters: | +
+
|
+
---|
automator/puppeteer.py
22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 |
|
full_path: str
+
+
+ property
+
+
+Get the full path of the process. Only works for COM receiver.
+ + +Returns: | +
+
|
+
---|
add_command(command_name, params, *args, **kwargs)
+
+Add the command to the command queue.
+ + +Parameters: | +
+
|
+
---|
automator/puppeteer.py
94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 |
|
close()
+
+Close the app. Only works for COM receiver.
+ +automator/puppeteer.py
145 +146 +147 +148 +149 +150 +151 |
|
create_command(command_name, params, *args, **kwargs)
+
+Create the command.
+ + +Parameters: | +
+
|
+
---|
automator/puppeteer.py
34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 |
|
execute_all_commands()
+
+Execute all the commands in the command queue.
+ + +Returns: | +
+
|
+
---|
automator/puppeteer.py
82 +83 +84 +85 +86 +87 +88 +89 +90 +91 +92 |
|
execute_command(command_name, params, *args, **kwargs)
+
+Execute the command.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 |
|
get_command_queue_length()
+
+Get the length of the command queue.
+ + +Returns: | +
+
|
+
---|
automator/puppeteer.py
105 +106 +107 +108 +109 +110 |
|
get_command_string(command_name, params)
+
+
+ staticmethod
+
+
+Generate a function call string.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 |
|
get_command_types(command_name)
+
+Get the command types.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 |
|
save()
+
+Save the current state of the app. Only works for COM receiver.
+ +automator/puppeteer.py
124 +125 +126 +127 +128 +129 +130 |
|
save_to_xml(file_path)
+
+Save the current state of the app to XML. Only works for COM receiver.
+ + +Parameters: | +
+
|
+
---|
automator/puppeteer.py
132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 |
|
The ReceiverManager
manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer
.
The class for the receiver manager.
+ +Initialize the receiver manager.
+ + + + + + +automator/puppeteer.py
175 +176 +177 +178 +179 +180 +181 +182 +183 |
|
com_receiver: WinCOMReceiverBasic
+
+
+ property
+
+
+Get the COM receiver.
+ + +Returns: | +
+
|
+
---|
receiver_factory_registry: Dict[str, Dict[str, Union[str, ReceiverFactory]]]
+
+
+ property
+
+
+Get the receiver factory registry.
+ + +Returns: | +
+
|
+
---|
receiver_list: List[ReceiverBasic]
+
+
+ property
+
+
+Get the receiver list.
+ + +Returns: | +
+
|
+
---|
create_api_receiver(app_root_name, process_name)
+
+Get the API receiver.
+ + +Parameters: | +
+
|
+
---|
automator/puppeteer.py
208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 |
|
create_ui_control_receiver(control, application)
+
+Build the UI controller.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 |
|
get_receiver_from_command_name(command_name)
+
+Get the receiver from the command name.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
235 +236 +237 +238 +239 +240 +241 +242 +243 +244 |
|
register(receiver_factory_class)
+
+
+ classmethod
+
+
+Decorator to register the receiver factory class to the receiver manager.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/puppeteer.py
276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 +288 +289 |
|
For further details, refer to the specific documentation for each component and class in the Automator module.
+ +The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.
+There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml
file. Below is the list of configurations related to the UI Automator:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
CONTROL_BACKEND |
+The backend for control action, currently supporting uia and win32 . |
+String | +"uia" | +
CONTROL_LIST |
+The list of widgets allowed to be selected. | +List | +["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"] | +
ANNOTATION_COLORS |
+The colors assigned to different control types for annotation. | +Dictionary | +{"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"} | +
API_PROMPT |
+The prompt for the UI automation API. | +String | +"ufo/prompts/share/base/api.yaml" | +
CLICK_API |
+The API used for click action, can be click_input or click . |
+String | +"click_input" | +
INPUT_TEXT_API |
+The API used for input text action, can be type_keys or set_text . |
+String | +"type_keys" | +
INPUT_TEXT_ENTER |
+Whether to press enter after typing the text. | +Boolean | +False | +
The receiver of the UI Automator is the ControlReceiver
class defined in the ufo/automator/ui_control/controller/control_receiver
module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver
provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver
class:
+ Bases: ReceiverBasic
The control receiver class.
+ +Initialize the control receiver.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/controller.py
33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 |
|
annotation(params, annotation_dict)
+
+Take a screenshot of the current application window and annotate the control item on the screenshot.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/controller.py
240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 |
|
atomic_execution(method_name, params)
+
+Atomic execution of the action on the control elements.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 |
|
click_input(params)
+
+Click the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
79 +80 +81 +82 +83 +84 +85 +86 +87 +88 +89 +90 +91 |
|
click_on_coordinates(params)
+
+Click on the coordinates of the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 |
|
drag_on_coordinates(params)
+
+Drag on the coordinates of the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 |
|
keyboard_input(params)
+
+Keyboard input on the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 |
|
no_action()
+
+No action on the control element.
+ + +Returns: | +
+
|
+
---|
automator/ui_control/controller.py
232 +233 +234 +235 +236 +237 +238 |
|
set_edit_text(params)
+
+Set the edit text of the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 |
|
summary(params)
+
+Visual summary of the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
141 +142 +143 +144 +145 +146 +147 +148 |
|
texts()
+
+Get the text of the control element.
+ + +Returns: | +
+
|
+
---|
automator/ui_control/controller.py
217 +218 +219 +220 +221 +222 |
|
transform_point(fraction_x, fraction_y)
+
+Transform the relative coordinates to the absolute coordinates.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
282 +283 +284 +285 +286 +287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 |
|
wait_enabled(timeout=10, retry_interval=0.5)
+
+Wait until the control is enabled.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/controller.py
256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 |
|
wait_visible(timeout=10, retry_interval=0.5)
+
+Wait until the window is enabled.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/controller.py
269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 |
|
wheel_mouse_input(params)
+
+Wheel mouse input on the control element.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/controller.py
224 +225 +226 +227 +228 +229 +230 |
|
The command of the UI Automator is the ControlCommand
class defined in the ufo/automator/ui_control/controller/ControlCommand
module. It encapsulates the function and parameters required to execute the action. The ControlCommand
class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand
class that inherits from the ControlCommand
class:
@ControlReceiver.register
+class ClickInputCommand(ControlCommand):
+ """
+ The click input command class.
+ """
+
+ def execute(self) -> str:
+ """
+ Execute the click input command.
+ :return: The result of the click input command.
+ """
+ return self.receiver.click_input(self.params)
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ Get the name of the atomic command.
+ :return: The name of the atomic command.
+ """
+ return "click_input"
+
+Note
+The concrete command classes must implement the execute
method to execute the action and the name
method to return the name of the atomic command.
Note
+Each command must register with a specific ControlReceiver
to be executed using the @ControlReceiver.register
decorator.
Below is the list of available commands in the UI Automator that are currently supported by UFO:
+Command Name | +Function Name | +Description | +
---|---|---|
ClickInputCommand |
+click_input |
+Click the control item with the mouse. | +
ClickOnCoordinatesCommand |
+click_on_coordinates |
+Click on the specific fractional coordinates of the application window. | +
DragOnCoordinatesCommand |
+drag_on_coordinates |
+Drag the mouse on the specific fractional coordinates of the application window. | +
SetEditTextCommand |
+set_edit_text |
+Add new text to the control item. | +
GetTextsCommand |
+texts |
+Get the text of the control item. | +
WheelMouseInputCommand |
+wheel_mouse_input |
+Scroll the control item. | +
KeyboardInputCommand |
+keyboard_input |
+Simulate the keyboard input. | +
Tip
+Please refer to the ufo/prompts/share/base/api.yaml
file for the detailed API documentation of the UI Automator.
Tip
+You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand
module.
We also support the use of the Web Automator
to get the content of a web page. The Web Automator
is implemented in ufo/autoamtor/app_apis/web
module.
There are several configurations that need to be set up before using the API Automator in the config_dev.yaml
file. Below is the list of configurations related to the API Automator:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
USE_APIS |
+Whether to allow the use of application APIs. | +Boolean | +True | +
APP_API_PROMPT_ADDRESS |
+The prompt address for the application API. | +Dict | +{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} | +
Note
+Only msedge.exe
and chrome.exe
are currently supported by the Web Automator.
The Web Automator receiver is the WebReceiver
class defined in the ufo/automator/app_apis/web/webclient.py
module:
+ Bases: ReceiverBasic
The base class for Web COM client using crawl4ai.
+ +Initialize the Web COM client.
+ + + + + + +automator/app_apis/web/webclient.py
21 +22 +23 +24 +25 +26 +27 |
|
web_crawler(url, ignore_link)
+
+Run the crawler with various options.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/app_apis/web/webclient.py
29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 |
|
We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator.
+@WebReceiver.register
+class WebCrawlerCommand(WebCommand):
+ """
+ The command to run the crawler with various options.
+ """
+
+ def execute(self):
+ """
+ Execute the command to run the crawler.
+ :return: The result content.
+ """
+ return self.receiver.web_crawler(
+ url=self.params.get("url"),
+ ignore_link=self.params.get("ignore_link", False),
+ )
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ The name of the command.
+ """
+ return "web_crawler"
+
+Below is the list of available commands in the Web Automator that are currently supported by UFO:
+Command Name | +Function Name | +Description | +
---|---|---|
WebCrawlerCommand |
+web_crawler |
+Get the content of a web page into a markdown format. | +
Tip
+Please refer to the ufo/prompts/apps/web/api.yaml
file for the prompt details for the WebCrawlerCommand
command.
UFO currently support the use of Win32 API
API automator to interact with the application's native API. We implement them in python using the pywin32
library. The API automator now supports Word
and Excel
applications, and we are working on extending the support to other applications.
There are several configurations that need to be set up before using the API Automator in the config_dev.yaml
file. Below is the list of configurations related to the API Automator:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
USE_APIS |
+Whether to allow the use of application APIs. | +Boolean | +True | +
APP_API_PROMPT_ADDRESS |
+The prompt address for the application API. | +Dict | +{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} | +
Note
+Only WINWORD.EXE
and EXCEL.EXE
are currently supported by the API Automator.
The base class for the receiver of the API Automator is the WinCOMReceiverBasic
class defined in the ufo/automator/app_apis/basic
module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic
class:
+ Bases: ReceiverBasic
The base class for Windows COM client.
+ +Initialize the Windows COM client.
+ + +Parameters: | +
+
|
+
---|
automator/app_apis/basic.py
20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 |
|
full_path: str
+
+
+ property
+
+
+Get the full path of the process.
+ + +Returns: | +
+
|
+
---|
app_match(object_name_list)
+
+Check if the process name matches the app root.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/app_apis/basic.py
57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 |
|
close()
+
+Close the app.
+ +automator/app_apis/basic.py
110 +111 +112 +113 +114 +115 +116 +117 |
|
get_object_from_process_name()
+
+
+ abstractmethod
+
+
+Get the object from the process name.
+ +automator/app_apis/basic.py
36 +37 +38 +39 +40 +41 |
|
get_suffix_mapping()
+
+Get the suffix mapping.
+ + +Returns: | +
+
|
+
---|
automator/app_apis/basic.py
43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 |
|
longest_common_substring_length(str1, str2)
+
+
+ staticmethod
+
+
+Get the longest common substring of two strings.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/app_apis/basic.py
127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 |
|
save()
+
+Save the current state of the app.
+ +automator/app_apis/basic.py
91 +92 +93 +94 +95 +96 +97 +98 |
|
save_to_xml(file_path)
+
+Save the current state of the app to XML.
+ + +Parameters: | +
+
|
+
---|
automator/app_apis/basic.py
100 +101 +102 +103 +104 +105 +106 +107 +108 |
|
The receiver of Word
and Excel
applications inherit from the WinCOMReceiverBasic
class. The WordReceiver
and ExcelReceiver
classes are defined in the ufo/automator/app_apis/word
and ufo/automator/app_apis/excel
modules, respectively:
The command of the API Automator for the Word
and Excel
applications in located in the client
module in the ufo/automator/app_apis/{app_name}
folder inheriting from the WinCOMCommand
class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand
class that inherits from the SelectTextCommand
class:
@WordWinCOMReceiver.register
+class SelectTextCommand(WinCOMCommand):
+ """
+ The command to select text.
+ """
+
+ def execute(self):
+ """
+ Execute the command to select text.
+ :return: The selected text.
+ """
+ return self.receiver.select_text(self.params.get("text"))
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ The name of the command.
+ """
+ return "select_text"
+
+Note
+The concrete command classes must implement the execute
method to execute the action and the name
method to return the name of the atomic command.
Note
+Each command must register with a concrete WinCOMReceiver
to be executed using the register
decorator.
Below is the list of available commands in the API Automator that are currently supported by UFO:
+Command Name | +Function Name | +Description | +
---|---|---|
InsertTableCommand |
+insert_table |
+Insert a table to a Word document. | +
SelectTextCommand |
+select_text |
+Select the text in a Word document. | +
SelectTableCommand |
+select_table |
+Select a table in a Word document. | +
Command Name | +Function Name | +Description | +
---|---|---|
GetSheetContentCommand |
+get_sheet_content |
+Get the content of a sheet in the Excel app. | +
Table2MarkdownCommand |
+table2markdown |
+Convert the table content in a sheet of the Excel app to markdown format. | +
InsertExcelTableCommand |
+insert_excel_table |
+Insert a table to the Excel sheet. | +
Tip
+Please refer to the ufo/prompts/apps/{app_name}/api.yaml
file for the prompt details for the commands.
Tip
+You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/
module.
This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml
is located in the ufo/config
directory and contains various settings and switches to customize the UFO agent for development purposes.
The following parameters are included in the system configuration of the UFO agent:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
CONTROL_BACKEND |
+The backend for control action, currently supporting uia and win32 . |
+String | +"uia" | +
MAX_STEP |
+The maximum step limit for completing the user request in a session. | +Integer | +100 | +
SLEEP_TIME |
+The sleep time in seconds between each step to wait for the window to be ready. | +Integer | +5 | +
RECTANGLE_TIME |
+The time in seconds for the rectangle display around the selected control. | +Integer | +1 | +
SAFE_GUARD |
+Whether to use the safe guard to ask for user confirmation before performing sensitive operations. | +Boolean | +True | +
CONTROL_LIST |
+The list of widgets allowed to be selected. | +List | +["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"] | +
HISTORY_KEYS |
+The keys of the step history added to the Blackboard for agent decision-making. |
+List | +["Step", "Thought", "ControlText", "Subtask", "Action", "Comment", "Results", "UserConfirm"] | +
ANNOTATION_COLORS |
+The colors assigned to different control types for annotation. | +Dictionary | +{"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"} | +
PRINT_LOG |
+Whether to print the log in the console. | +Boolean | +False | +
CONCAT_SCREENSHOT |
+Whether to concatenate the screenshots into a single image for the LLM input. | +Boolean | +False | +
INCLUDE_LAST_SCREENSHOT |
+Whether to include the screenshot from the last step in the observation. | +Boolean | +True | +
LOG_LEVEL |
+The log level for the UFO agent. | +String | +"DEBUG" | +
REQUEST_TIMEOUT |
+The call timeout in seconds for the LLM model. | +Integer | +250 | +
USE_APIS |
+Whether to allow the use of application APIs. | +Boolean | +True | +
LOG_XML |
+Whether to log the XML file at every step. | +Boolean | +False | +
SCREENSHOT_TO_MEMORY |
+Whether to allow the screenshot to Blackboard for the agent's decision making. |
+Boolean | +True | +
SAVE_UI_TREE |
+Whether to save the UI tree in the log. | +Boolean | +False | +
The main prompt templates include the prompts in the UFO agent for both system
and user
roles.
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
HOSTAGENT_PROMPT |
+The main prompt template for the HostAgent . |
+String | +"ufo/prompts/share/base/host_agent.yaml" | +
APPAGENT_PROMPT |
+The main prompt template for the AppAgent . |
+String | +"ufo/prompts/share/base/app_agent.yaml" | +
FOLLOWERAGENT_PROMPT |
+The main prompt template for the FollowerAgent . |
+String | +"ufo/prompts/share/base/app_agent.yaml" | +
EVALUATION_PROMPT |
+The prompt template for the evaluation. | +String | +"ufo/prompts/evaluation/evaluate.yaml" | +
Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite
directory to reduce the input size for specific token limits.
Example prompt templates are used for demonstration purposes in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
HOSTAGENT_EXAMPLE_PROMPT |
+The example prompt template for the HostAgent used for demonstration. |
+String | +"ufo/prompts/examples/{mode}/host_agent_example.yaml" | +
APPAGENT_EXAMPLE_PROMPT |
+The example prompt template for the AppAgent used for demonstration. |
+String | +"ufo/prompts/examples/{mode}/app_agent_example.yaml" | +
Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode}
directory to reduce the input size for demonstration purposes.
These configuration parameters are used for experience and demonstration learning in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
EXPERIENCE_PROMPT |
+The prompt for self-experience learning. | +String | +"ufo/prompts/experience/experience_summary.yaml" | +
EXPERIENCE_SAVED_PATH |
+The path to save the experience learning data. | +String | +"vectordb/experience/" | +
DEMONSTRATION_PROMPT |
+The prompt for user demonstration learning. | +String | +"ufo/prompts/demonstration/demonstration_summary.yaml" | +
DEMONSTRATION_SAVED_PATH |
+The path to save the demonstration learning data. | +String | +"vectordb/demonstration/" | +
These prompt configuration parameters are used for the application and control APIs in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
API_PROMPT |
+The prompt for the UI automation API. | +String | +"ufo/prompts/share/base/api.yaml" | +
APP_API_PROMPT_ADDRESS |
+The prompt address for the application API. | +Dict | +{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} | +
The API configuration parameters are used for the pywinauto API in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
CLICK_API |
+The API used for click action, can be click_input or click . |
+String | +"click_input" | +
INPUT_TEXT_API |
+The API used for input text action, can be type_keys or set_text . |
+String | +"type_keys" | +
INPUT_TEXT_ENTER |
+Whether to press enter after typing the text. | +Boolean | +False | +
The control filtering configuration parameters are used for control filtering in the agent's observation.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
CONTROL_FILTER |
+The control filter type, can be TEXT , SEMANTIC , or ICON . |
+List | +[] | +
CONTROL_FILTER_TOP_K_PLAN |
+The control filter effect on top k plans from the agent. | +Integer | +2 | +
CONTROL_FILTER_TOP_K_SEMANTIC |
+The control filter top k for semantic similarity. | +Integer | +15 | +
CONTROL_FILTER_TOP_K_ICON |
+The control filter top k for icon similarity. | +Integer | +15 | +
CONTROL_FILTER_MODEL_SEMANTIC_NAME |
+The control filter model name for semantic similarity. | +String | +"all-MiniLM-L6-v2" | +
CONTROL_FILTER_MODEL_ICON_NAME |
+The control filter model name for icon similarity. | +String | +"clip-ViT-B-32" | +
The customization configuration parameters are used for customizations in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
ASK_QUESTION |
+Whether to ask the user for a question. | +Boolean | +True | +
USE_CUSTOMIZATION |
+Whether to enable the customization. | +Boolean | +True | +
QA_PAIR_FILE |
+The path for the historical QA pairs. | +String | +"customization/historical_qa.txt" | +
QA_PAIR_NUM |
+The number of QA pairs for the customization. | +Integer | +20 | +
The evaluation configuration parameters are used for the evaluation in the UFO agent.
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
EVA_SESSION |
+Whether to include the session in the evaluation. | +Boolean | +True | +
EVA_ROUND |
+Whether to include the round in the evaluation. | +Boolean | +False | +
EVA_ALL_SCREENSHOTS |
+Whether to include all the screenshots in the evaluation. | +Boolean | +True | +
You can customize the configuration parameters in the config_dev.yaml
file to suit your development needs and enhance the functionality of the UFO agent.
We provide a configuration file pricing_config.yaml
to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config
directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.
You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml
file. Below is the default pricing configuration:
# Prices in $ per 1000 tokens
+# Last updated: 2024-05-13
+PRICES: {
+ "openai/gpt-4-0613": {"input": 0.03, "output": 0.06},
+ "openai/gpt-3.5-turbo-0613": {"input": 0.0015, "output": 0.002},
+ "openai/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
+ "openai/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
+ "openai/gpt-4-1106-vision-preview": {"input": 0.01, "output": 0.03},
+ "openai/gpt-4": {"input": 0.03, "output": 0.06},
+ "openai/gpt-4-32k": {"input": 0.06, "output": 0.12},
+ "openai/gpt-4-turbo": {"input":0.01,"output": 0.03},
+ "openai/gpt-4o": {"input": 0.005,"output": 0.015},
+ "openai/gpt-4o-2024-05-13": {"input": 0.005, "output": 0.015},
+ "openai/gpt-3.5-turbo-0125": {"input": 0.0005, "output": 0.0015},
+ "openai/gpt-3.5-turbo-1106": {"input": 0.001, "output": 0.002},
+ "openai/gpt-3.5-turbo-instruct": {"input": 0.0015, "output": 0.002},
+ "openai/gpt-3.5-turbo-16k-0613": {"input": 0.003, "output": 0.004},
+ "openai/whisper-1": {"input": 0.006, "output": 0.006},
+ "openai/tts-1": {"input": 0.015, "output": 0.015},
+ "openai/tts-hd-1": {"input": 0.03, "output": 0.03},
+ "openai/text-embedding-ada-002-v2": {"input": 0.0001, "output": 0.0001},
+ "openai/text-davinci:003": {"input": 0.02, "output": 0.02},
+ "openai/text-ada-001": {"input": 0.0004, "output": 0.0004},
+ "azure/gpt-35-turbo-20220309":{"input": 0.0015, "output": 0.002},
+ "azure/gpt-35-turbo-20230613":{"input": 0.0015, "output": 0.002},
+ "azure/gpt-35-turbo-16k-20230613":{"input": 0.003, "output": 0.004},
+ "azure/gpt-35-turbo-1106":{"input": 0.001, "output": 0.002},
+ "azure/gpt-4-20230321":{"input": 0.03, "output": 0.06},
+ "azure/gpt-4-32k-20230321":{"input": 0.06, "output": 0.12},
+ "azure/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
+ "azure/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
+ "azure/gpt-4-visual-preview": {"input": 0.01, "output": 0.03},
+ "azure/gpt-4-turbo-20240409": {"input":0.01,"output": 0.03},
+ "azure/gpt-4o": {"input": 0.005,"output": 0.015},
+ "azure/gpt-4o-20240513": {"input": 0.005, "output": 0.015},
+ "qwen/qwen-vl-plus": {"input": 0.008, "output": 0.008},
+ "qwen/qwen-vl-max": {"input": 0.02, "output": 0.02},
+ "gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105},
+ "gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105},
+ "gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015},
+}
+
+Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.
+ +An overview of the user configuration options available in UFO. You need to rename the config.yaml.template
in the folder ufo/config
to config.yaml
to configure the LLMs and other custom settings.
You can configure the LLMs for the HOST_AGENT
and APP_AGENT
separately in the config.yaml
file. The FollowerAgent
and EvaluationAgent
share the same LLM configuration as the APP_AGENT
. Additionally, you can configure a backup LLM engine in the BACKUP_AGENT
field to handle cases where the primary engines fail during inference.
Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models
section of the documentation.
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
VISUAL_MODE |
+Whether to use visual mode to understand screenshots and take actions | +Boolean | +True | +
API_TYPE |
+The API type: "openai" for the OpenAI API, "aoai" for the AOAI API. | +String | +"openai" | +
API_BASE |
+The API endpoint for the LLM | +String | +"https://api.openai.com/v1/chat/completions" | +
API_KEY |
+The API key for the LLM | +String | +"sk-" | +
API_VERSION |
+The version of the API | +String | +"2024-02-15-preview" | +
API_MODEL |
+The LLM model name | +String | +"gpt-4-vision-preview" | +
The following additional configuration option is available for the AOAI API:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
API_DEPLOYMENT_ID |
+The deployment ID, only available for the AOAI API | +String | +"" | +
Ensure to fill in the necessary API details for both the HOST_AGENT
and APP_AGENT
to enable UFO to interact with the LLMs effectively.
You can also configure additional parameters for the LLMs in the config.yaml
file:
Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
MAX_TOKENS |
+The maximum token limit for the response completion | +Integer | +2000 | +
MAX_RETRY |
+The maximum retry limit for the response completion | +Integer | +3 | +
TEMPERATURE |
+The temperature of the model: the lower the value, the more consistent the output of the model | +Float | +0.0 | +
TOP_P |
+The top_p of the model: the lower the value, the more conservative the output of the model | +Float | +0.0 | +
TIMEOUT |
+The call timeout in seconds | +Integer | +60 | +
You can configure the RAG parameters in the config.yaml
file to enhance the UFO agent with additional knowledge sources:
Configure the following parameters to allow UFO to use offline documents for the decision-making process:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_OFFLINE_DOCS |
+Whether to use the offline RAG | +Boolean | +False | +
RAG_OFFLINE_DOCS_RETRIEVED_TOPK |
+The topk for the offline retrieved documents | +Integer | +1 | +
Configure the following parameters to allow UFO to use online Bing search for the decision-making process:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_ONLINE_SEARCH |
+Whether to use the Bing search | +Boolean | +False | +
BING_API_KEY |
+The Bing search API key | +String | +"" | +
RAG_ONLINE_SEARCH_TOPK |
+The topk for the online search | +Integer | +5 | +
RAG_ONLINE_RETRIEVED_TOPK |
+The topk for the online retrieved searched results | +Integer | +1 | +
Configure the following parameters to allow UFO to use the RAG from its self-experience:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_EXPERIENCE |
+Whether to use the RAG from its self-experience | +Boolean | +False | +
RAG_EXPERIENCE_RETRIEVED_TOPK |
+The topk for the offline retrieved documents | +Integer | +5 | +
Configure the following parameters to allow UFO to use the RAG from user demonstration:
+Configuration Option | +Description | +Type | +Default Value | +
---|---|---|---|
RAG_DEMONSTRATION |
+Whether to use the RAG from its user demonstration | +Boolean | +False | +
RAG_DEMONSTRATION_RETRIEVED_TOPK |
+The topk for the offline retrieved documents | +Integer | +5 | +
RAG_DEMONSTRATION_COMPLETION_N |
+The number of completion choices for the demonstration result | +Integer | +3 | +
Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.
+ +Users or application developers can provide human demonstrations to the AppAgent
to guide it in executing similar tasks in the future. The AppAgent
uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.
Currently, UFO supports learning from user trajectories recorded by Steps Recorder integrated within Windows. More tools will be supported in the future.
+Follow the official guidance to use Steps Recorder to record user demonstrations.
+Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well.
++ +
+ +Review the recorded steps and save them to a ZIP file. Refer to the sample_record.zip for an example of recorded steps for a specific request, such as "sending an email to example@gmail.com to say hi."
+Once you have your demonstration record ZIP file ready, you can parse it as an example to support RAG for UFO. Follow these steps:
+# Assume you are in the cloned UFO folder
+python -m record_processor -r "<your request for the demonstration>" -p "<record ZIP file path>"
+
+<your request for the demonstration>
with the specific request, such as "sending an email to example@gmail.com to say hi."<record ZIP file path>
with the full path to the ZIP file you just created.This command will parse the record and summarize it into an execution plan. You'll see a confirmation message similar to the following:
+Here are the plans summarized from your demonstration:
+Plan [1]
+(1) Input the email address 'example@gmail.com' in the 'To' field.
+(2) Input the subject of the email. I need to input 'Greetings'.
+(3) Input the content of the email. I need to input 'Hello,\nI hope this message finds you well. I am writing to send you a warm greeting and to wish you a great day.\nBest regards.'
+(4) Click the Send button to send the email.
+Plan [2]
+(1) ***
+(2) ***
+(3) ***
+Plan [3]
+(1) ***
+(2) ***
+(3) ***
+Would you like to save any one of them as a future reference for the agent? Press [1] [2] [3] to save the corresponding plan, or press any other key to skip.
+
+Press 1
to save the plan into its memory for future reference. A sample can be found here.
You can view a demonstration video below:
+ + +After creating the offline indexer, refer to the Learning from User Demonstrations section for guidance on how to use human demonstrations to enhance the AppAgent.
+Help documents provide guidance to the AppAgent
in executing specific tasks. The AppAgent
uses these documents to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.
Currently, UFO supports processing help documents in XML format, which is the default format for official help documents of Microsoft apps. More formats will be supported in the future.
+To create a dedicated document for a specific task of an app, save it in a file named, for example, task.xml
. This document should be accompanied by a metadata file with the same prefix but with the .meta
extension, such as task.xml.meta
. The metadata file should include:
title
: Describes the task at a high level.Content-Summary
: Summarizes the content of the help document.These two files are used for similarity search with user requests, so it is important to write them carefully. Examples of a help document and its metadata can be found here and here.
+Once you have prepared all help documents and their metadata, place them into a folder. Sub-folders for the help documents are allowed, but ensure that each help document and its corresponding metadata are placed in the same directory.
+After organizing your documents in a folder named path_of_the_docs
, you can create an offline indexer to support RAG for UFO. Follow these steps:
# Assume you are in the cloned UFO folder
+python -m learner --app <app_name> --docs <path_of_the_docs>
+
+<app_name>
with the name of the application, such as PowerPoint or WeChat.<path_of_the_docs>
with the full path to the folder containing all your documents.This command will create an offline indexer for all documents in the path_of_the_docs
folder using Faiss and embedding with sentence transformer (additional embeddings will be supported soon). By default, the created index will be placed here.
Note
+Ensure the app_name
is accurately defined, as it is used to match the offline indexer in online RAG.
After creating the offline indexer, you can find the guidance on how to use the help documents to enhance the AppAgent in the Learning from Help Documents section.
+ +UFO provides a flexible framework and SDK for application developers to empower their applications with AI capabilities by wrapping them into an AppAgent
. By creating an AppAgent
, you can leverage the power of UFO to interact with your application and automate tasks.
To create an AppAgent
, you can provide the following components:
Component | +Description | +Usage Documentation | +
---|---|---|
Help Documents | +The help documents for the application to guide the AppAgent in executing tasks. |
+Learning from Help Documents | +
User Demonstrations | +The user demonstrations for the application to guide the AppAgent in executing tasks. |
+Learning from User Demonstrations | +
Native API Wrappers | +The native API wrappers for the application to interact with the application. | +Automator | +
UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes.
+Before developing the native API wrappers, we strongly recommend that you read the design of the Automator.
+The Receiver
is a class that receives the native API calls from the AppAgent
and executes them. To wrap your application's native API, you need to create a Receiver
class that contains the methods to execute the native API calls.
To create a Receiver
class, follow these steps:
ufo/automator/app_api/
directory.{your_application}_client.py
.{Your_Receiver}
, inheriting from the ReceiverBasic
class located in ufo/automator/basic.py
.Your_Receiver
class with the object that executes the native API calls. For example, if your API is based on a com
object, initialize the com
object in the __init__
method of the Your_Receiver
class.Example of WinCOMReceiverBasic
class:
class WinCOMReceiverBasic(ReceiverBasic):
+ """
+ The base class for Windows COM client.
+ """
+
+ _command_registry: Dict[str, Type[CommandBasic]] = {}
+
+ def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
+ """
+ Initialize the Windows COM client.
+ :param app_root_name: The app root name.
+ :param process_name: The process name.
+ :param clsid: The CLSID of the COM object.
+ """
+
+ self.app_root_name = app_root_name
+ self.process_name = process_name
+ self.clsid = clsid
+ self.client = win32com.client.Dispatch(self.clsid)
+ self.com_object = self.get_object_from_process_name()
+
+Your_Receiver
class to execute the native API calls.Example of ExcelWinCOMReceiver
class:
def table2markdown(self, sheet_name: str) -> str:
+ """
+ Convert the table in the sheet to a markdown table string.
+ :param sheet_name: The sheet name.
+ :return: The markdown table string.
+ """
+
+ sheet = self.com_object.Sheets(sheet_name)
+ data = sheet.UsedRange()
+ df = pd.DataFrame(data[1:], columns=data[0])
+ df = df.dropna(axis=0, how="all")
+ df = df.applymap(self.format_value)
+
+ return df.to_markdown(index=False)
+
+APIReceiverFactory
class to manage multiple Receiver
classes that share the same API type.create_receiver
and name
methods in the ReceiverFactory
class. The create_receiver
method should return the Receiver
class.create_receiver
takes the app_root_name
and process_name
as parameters and returns the Receiver
class.ReceiverFactory
class with the decorator @ReceiverManager.register
.Example of the COMReceiverFactory
class:
from ufo.automator.puppeteer import ReceiverManager
+
+@ReceiverManager.register
+class COMReceiverFactory(APIReceiverFactory):
+ """
+ The factory class for the COM receiver.
+ """
+
+ def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic:
+ """
+ Create the wincom receiver.
+ :param app_root_name: The app root name.
+ :param process_name: The process name.
+ :return: The receiver.
+ """
+
+ com_receiver = self.__com_client_mapper(app_root_name)
+ clsid = self.__app_root_mappping(app_root_name)
+
+ if clsid is None or com_receiver is None:
+ # print_with_color(f"Warning: Win32COM API is not supported for {process_name}.", "yellow")
+ return None
+
+ return com_receiver(app_root_name, process_name, clsid)
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ Get the name of the receiver factory.
+ :return: The name of the receiver factory.
+ """
+ return "COM"
+
+Note
+The create_receiver
method should return None
if the application is not supported.
Note
+You must register your ReceiverFactory
with the decorator @ReceiverManager.register
for the ReceiverManager
to manage the ReceiverFactory
.
The Receiver
class is now ready to receive the native API calls from the AppAgent
.
Commands are the actions that the AppAgent
can execute on the application. To create a command for the native API, you need to create a Command
class that contains the method to execute the native API calls.
Command
class in the same Python file where the Receiver
class is located. The Command
class should inherit from the CommandBasic
class located in ufo/automator/basic.py
.Example:
+class WinCOMCommand(CommandBasic):
+ """
+ The abstract command interface.
+ """
+
+ def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None:
+ """
+ Initialize the command.
+ :param receiver: The receiver of the command.
+ """
+ self.receiver = receiver
+ self.params = params if params is not None else {}
+
+ @abstractmethod
+ def execute(self):
+ pass
+
+ @classmethod
+ def name(cls) -> str:
+ """
+ Get the name of the command.
+ :return: The name of the command.
+ """
+ return cls.__name__
+
+execute
method in the Command
class to call the receiver to execute the native API calls.Example:
+def execute(self):
+ """
+ Execute the command to insert a table.
+ :return: The inserted table.
+ """
+ return self.receiver.insert_excel_table(
+ sheet_name=self.params.get("sheet_name", 1),
+ table=self.params.get("table"),
+ start_row=self.params.get("start_row", 1),
+ start_col=self.params.get("start_col", 1),
+ )
+
+3. Register the Command Class:
+Command
class in the corresponding Receiver
class using the @your_receiver.register
decorator.Example:
+@ExcelWinCOMReceiver.register
+class InsertExcelTable(WinCOMCommand):
+ ...
+
+The Command
class is now registered in the Receiver
class and available for the AppAgent
to execute the native API calls.
To let the AppAgent
know the usage of the native API calls, you need to provide prompt descriptions.
- Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory.
+
+api.yaml
file.Example:
+table2markdown:
+summary: |-
+ "table2markdown" is to get the table content in a sheet of the Excel app and convert it to markdown format.
+class_name: |-
+ GetSheetContent
+usage: |-
+ [1] API call: table2markdown(sheet_name: str)
+ [2] Args:
+ - sheet_name: The name of the sheet in the Excel app.
+ [3] Example: table2markdown(sheet_name="Sheet1")
+ [4] Available control item: Any control item in the Excel app.
+ [5] Return: the markdown format string of the table content of the sheet.
+
+Note
+The table2markdown
is the name of the native API call. It MUST
match the name()
defined in the corresponding Command
class!
config_dev.yaml
APP_API_PROMPT_ADDRESS
field of config_dev.yaml
file with the application program name as the key and the prompt file address as the value.Example:
+APP_API_PROMPT_ADDRESS: {
+ "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml",
+ "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml",
+ "msedge.exe": "ufo/prompts/apps/web/api.yaml",
+ "chrome.exe": "ufo/prompts/apps/web/api.yaml"
+ "your_application_program_name": "YOUR_APPLICATION_API_PROMPT"
+}
+
+Note
+The your_application_program_name
must match the name of the application program.
The AppAgent
can now use the prompt descriptions to understand the usage of the native API calls.
By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the AppAgent
to execute the native API calls on the application!
The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process.
+In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.
+The ExecuteFlow
class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks.
The task execution in the ExecuteFlow
class follows a structured sequence to ensure accurate and traceable task performance:
Retrieve or create an ExecuteAgent
for executing the task.
Plan Execution:
+instantiated_plan
. Parse the step to extract information like subtasks, control text, and the required operation.
+Action Execution:
+Capture screenshots of the application window and selected controls for logging and debugging.
+Result Logging:
+Log details of the step execution, including control information, performed action, and results.
+Finalization:
+Input of ExecuteAgent
Parameter | +Type | +Description | +
---|---|---|
name |
+str |
+The name of the agent. Used for identification and logging purposes. | +
process_name |
+str |
+The name of the application process that the agent interacts with. | +
app_root_name |
+str |
+The name of the root application window or main UI component being targeted. | +
--- | ++ | + |
The evaluation process in the ExecuteFlow
class is designed to assess the performance of the executed task based on predefined prompts:
It uses an ExecuteEvalAgent
initialized during class construction.
Perform Evaluation:
+ExecuteEvalAgent
evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution. The evaluation process outputs a result summary (e.g., quality flag, comments, and task type).
+Log and Output Results:
+
+ Bases: AppAgentProcessor
ExecuteFlow class for executing the task and saving the result.
+ +Initialize the execute flow for a task.
+ + +Parameters: | +
+
|
+
---|
execution/workflow/execute_flow.py
30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 |
|
execute(request, instantiated_plan)
+
+Execute the execute flow: Execute the task and save the result.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
execution/workflow/execute_flow.py
101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 |
|
execute_action()
+
+Execute the action.
+ +execution/workflow/execute_flow.py
306 +307 +308 +309 +310 +311 +312 +313 +314 +315 +316 +317 +318 +319 +320 +321 +322 +323 +324 +325 +326 +327 +328 +329 +330 +331 +332 +333 +334 +335 +336 +337 +338 +339 +340 +341 +342 +343 +344 +345 +346 +347 +348 +349 +350 +351 +352 +353 +354 +355 +356 +357 +358 +359 +360 +361 +362 +363 +364 +365 +366 +367 +368 +369 +370 +371 +372 +373 |
|
execute_plan(instantiated_plan)
+
+Get the executed result from the execute agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
execution/workflow/execute_flow.py
132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 |
|
general_error_handler()
+
+Handle general errors.
+ +execution/workflow/execute_flow.py
375 +376 +377 +378 +379 +380 |
|
init_and_final_capture_screenshot()
+
+Capture the screenshot.
+ +execution/workflow/execute_flow.py
285 +286 +287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 |
|
log_save()
+
+Log the constructed prompt message for the PrefillAgent.
+ +execution/workflow/execute_flow.py
246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 |
|
print_step_info()
+
+Print the step information.
+ +execution/workflow/execute_flow.py
233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 |
|
process()
+
+Process the current step.
+ +execution/workflow/execute_flow.py
221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 |
|
+ Bases: AppAgent
The Agent for task execution.
+ +Initialize the ExecuteAgent.
+ + +Parameters: | +
+
|
+
---|
execution/agent/execute_agent.py
12 +13 +14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 |
|
+ Bases: EvaluationAgent
The Agent for task execution evaluation.
+ +Initialize the ExecuteEvalAgent.
+ + +Parameters: | +
+
|
+
---|
execution/agent/execute_eval_agent.py
14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 |
|
get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None)
+
+Get the prompter for the agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
execution/agent/execute_eval_agent.py
42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 |
|
There are three key steps in the instantiation process:
+Choose a template
file according to the specified app and instruction.Prefill
the task using the current screenshot.Filter
the established task.Given the initial task, the dataflow first choose a template (Phase 1
), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2
). Finnally, it will filter the established task to evaluate the quality of task-action data.
Templates for your app must be defined and described in dataflow/templates/app
. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx
files in dataflow /templates/word
, along with a description.json
file. The appropriate template will be selected based on how well its description matches the instruction.
The ChooseTemplateFlow
uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files.
Class to select and copy the most relevant template file based on the given task context.
+ +Initialize the flow with the given task context.
+ + +Parameters: | +
+
|
+
---|
instantiation/workflow/choose_template_flow.py
27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 |
|
execute()
+
+Execute the flow and return the copied template path.
+ + +Returns: | +
+
|
+
---|
instantiation/workflow/choose_template_flow.py
43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 |
|
The PrefillFlow
class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent
for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution.
+ Bases: AppAgentProcessor
Class to manage the prefill process by refining planning steps and automating UI interactions
+ +Initialize the prefill flow with the application context.
+ + +Parameters: | +
+
|
+
---|
instantiation/workflow/prefill_flow.py
29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 |
|
execute(template_copied_path, original_task, refined_steps)
+
+Start the execution by retrieving the instantiated result.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/workflow/prefill_flow.py
80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 |
|
The PrefillAgent
class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter
. It integrates system, user, and dynamic context to generate actionable inputs for automation workflows.
+ Bases: BasicAgent
The Agent for task instantialization and action sequence generation.
+ +Initialize the PrefillAgent.
+ + +Parameters: | +
+
|
+
---|
instantiation/agent/prefill_agent.py
16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 +41 +42 |
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+Get the prompt for the agent. +This is the abstract method from BasicAgent that needs to be implemented.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/agent/prefill_agent.py
44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 |
|
message_constructor(dynamic_examples, given_task, reference_steps, doc_control_state, log_path)
+
+Construct the prompt message for the PrefillAgent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/agent/prefill_agent.py
57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 +79 +80 +81 +82 +83 +84 +85 +86 |
|
process_comfirmation()
+
+Confirm the process. +This is the abstract method from BasicAgent that needs to be implemented.
+ +instantiation/agent/prefill_agent.py
88 +89 +90 +91 +92 +93 +94 |
|
The FilterFlow
class is designed to process and refine task plans by leveraging a FilterAgent
.
Class to refine the plan steps and prefill the file based on filtering criteria.
+ +Initialize the filter flow for a task.
+ + +Parameters: | +
+
|
+
---|
instantiation/workflow/filter_flow.py
21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 |
|
execute(instantiated_request)
+
+Execute the filter flow: Filter the task and save the result.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/workflow/filter_flow.py
51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 |
|
+ Bases: BasicAgent
The Agent to evaluate the instantiated task is correct or not.
+ +Initialize the FilterAgent.
+ + +Parameters: | +
+
|
+
---|
instantiation/agent/filter_agent.py
14 +15 +16 +17 +18 +19 +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +40 |
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+Get the prompt for the agent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/agent/filter_agent.py
42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 |
|
message_constructor(request, app)
+
+Construct the prompt message for the FilterAgent.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
instantiation/agent/filter_agent.py
60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 |
|
process_comfirmation()
+
+Confirm the process. +This is the abstract method from BasicAgent that needs to be implemented.
+ +instantiation/agent/filter_agent.py
80 +81 +82 +83 +84 +85 +86 |
|
This repository contains the implementation of the Data Collection process for training the Large Action Models (LAMs) in the paper of [Large Action Models: From Inception to Implementation]. The Data Collection process is designed to streamline task processing, ensuring that all necessary steps are seamlessly integrated from initialization to execution. This module is part of the UFO project.
+Dataflow uses UFO to implement instantiation
, execution
, and dataflow
for a given task, with options for batch processing and single processing.
choosing template
, prefill
and filter
.Instantiation
. And after execution, an evaluate agent will evaluate the quality of the whole execution process.You can use instantiation
and execution
independently if you only need to perform one specific part of the process. When both steps are required for a task, the dataflow
process streamlines them, allowing you to execute tasks from start to finish in a single pipeline.
The overall processing of dataflow is as below. Given a task-plan data, the LLMwill instantiatie the task-action data, including choosing template, prefill, filter.
+You should install the necessary packages in the UFO root folder:
+pip install -r requirements.txt
+
+Before running dataflow, you need to provide your LLM configurations individually for PrefillAgent and FilterAgent. You can create your own config file dataflow/config/config.yaml
, by copying the dataflow/config/config.yaml.template
and editing config for PREFILL_AGENT and FILTER_AGENT as follows:
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
+API_KEY: "sk-", # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
+
+VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
+API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY", # The aoai API key
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False
and proper API_MODEL
(openai) and API_DEPLOYMENT_ID
(aoai).
You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml
file:
VISUAL_MODE: False # To enable non-visual mode.
API_MODEL
(OpenAI) and API_DEPLOYMENT_ID
(AOAI) for each agent.Ensure you configure these settings accurately to leverage non-visual models effectively.
+config_dev.yaml
specifies the paths of relevant files and contains default settings. The match strategy for the window match and control filter supports options: 'contains'
, 'fuzzy'
, and 'regex'
, allowing flexible matching strategy for users. The MAX_STEPS
is the max step for the execute_flow, which can be set by users.
Note
+The specific implementation and invocation method of the matching strategy can refer to windows_app_env.
+Note
+BE CAREFUL! If you are using GitHub or other open-source tools, do not expose your config.yaml
online, as it contains your private keys.
Certain files need to be prepared before running the task.
+The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to dataflow /tasks
. This path can be changed in the dataflow/config/config.yaml
file, or you can specify it in the terminal, as mentioned in 4. Start Running. For example, a task stored in dataflow/tasks/prefill/
may look like this:
{
+ // The app you want to use
+ "app": "word",
+ // A unique ID to distinguish different tasks
+ "unique_id": "1",
+ // The task and steps to be instantiated
+ "task": "Type 'hello' and set the font type to Arial",
+ "refined_steps": [
+ "Type 'hello'",
+ "Set the font to Arial"
+ ]
+}
+
+You should place an app file as a reference for instantiation in a folder named after the app.
+For example, if you have template1.docx
for Word, it should be located at dataflow/templates/word/template1.docx
.
Additionally, for each app folder, there should be a description.json
file located at dataflow/templates/word/description.json
, which describes each template file in detail. It may look like this:
{
+ "template1.docx": "A document with a rectangle shape",
+ "template2.docx": "A document with a line of text"
+}
+
+If a description.json
file is not present, one template file will be selected at random.
Ensure the following files are in place:
+The structure of the files can be:
+dataflow/
+|
+├── tasks
+│ └── prefill
+│ ├── bulleted.json
+│ ├── delete.json
+│ ├── draw.json
+│ ├── macro.json
+│ └── rotate.json
+├── templates
+│ └── word
+│ ├── description.json
+│ ├── template1.docx
+│ ├── template2.docx
+│ ├── template3.docx
+│ ├── template4.docx
+│ ├── template5.docx
+│ ├── template6.docx
+│ └── template7.docx
+└── ...
+
+After finishing the previous steps, you can use the following commands in the command line. We provide single / batch process, for which you need to give the single file path / folder path. Determine the type of path provided by the user and automatically decide whether to process a single task or batch tasks.
+Also, you can choose to use instantiation
/ execution
sections individually, or use them as a whole section, which is named as dataflow
.
The default task hub is set to be "TASKS_HUB"
in dataflow/config_dev.yaml
.
python -m dataflow -dataflow --task_path path_to_task_file
+
+python -m dataflow -instantiation --task_path path_to_task_file
+
+python -m dataflow -execution --task_path path_to_task_file
+
+There are three key steps in the instantiation process:
+Choose a template
file according to the specified app and instruction.Prefill
the task using the current screenshot.Filter
the established task.Given the initial task, the dataflow first choose a template (Phase 1
), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2
). Finnally, it will filter the established task to evaluate the quality of task-action data.
Templates for your app must be defined and described in dataflow/templates/app
. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx
files in dataflow /templates/word
, along with a description.json
file.
The appropriate template will be selected based on how well its description matches the instruction.
+After selecting the template file, it will be opened, and a screenshot will be taken. If the template file is currently in use, errors may occur.
+The screenshot will be sent to the action prefill agent, which will return a modified task.
+The completed task will be evaluated by a filter agent, which will assess it and provide feedback.
+The more detailed code design documentation for instantiation can be found in instantiation.
+The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process.
+In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.
+The more detailed code design documentation for execution can be found in execution.
+The structure of the results of the task is as below:
+UFO/
+├── dataflow/ # Root folder for dataflow
+│ └── results/ # Directory for storing task processing results
+│ ├── saved_document/ # Directory for final document results
+│ ├── instantiation/ # Directory for instantiation results
+│ │ ├── instantiation_pass/ # Tasks successfully instantiated
+│ │ └── instantiation_fail/ # Tasks that failed instantiation
+│ ├── execution/ # Directory for execution results
+│ │ ├── execution_pass/ # Tasks successfully executed
+│ │ ├── execution_fail/ # Tasks that failed execution
+│ │ └── execution_unsure/ # Tasks with uncertain execution results
+│ ├── dataflow/ # Directory for dataflow results
+│ │ ├── execution_pass/ # Tasks successfully executed
+│ │ ├── execution_fail/ # Tasks that failed execution
+│ │ └── execution_unsure/ # Tasks with uncertain execution results
+│ └── ...
+└── ...
+
+This directory structure organizes the results of task processing into specific categories, including instantiation, execution, and dataflow outcomes. +2. Instantiation:
+The instantiation
directory contains subfolders for tasks that were successfully instantiated (instantiation_pass
) and those that failed during instantiation (instantiation_fail
).
+3. Execution:
Results of task execution are stored under the execution
directory, categorized into successful tasks (execution_pass
), failed tasks (execution_fail
), and tasks with uncertain outcomes (execution_unsure
).
+4. Dataflow Results:
The dataflow
directory similarly holds results of tasks based on execution success, failure, or uncertainty, providing a comprehensive view of the data processing pipeline.
+5. Saved Documents:
Instantiated results are separately stored in the saved_document
directory for easy access and reference.
This section illustrates the structure of the result of the task, organized in a hierarchical format to describe the various fields and their purposes. The result data include unique_id
,app
, original
, execution_result
, instantiation_result
, time_cost
.
string
, array
, object
) clearly specifies the format of the data.{
+ "unique_id": "102",
+ "app": "word",
+ "original": {
+ "original_task": "Find which Compatibility Mode you are in for Word",
+ "original_steps": [
+ "1.Click the **File** tab.",
+ "2.Click **Info**.",
+ "3.Check the **Compatibility Mode** indicator at the bottom of the document preview pane."
+ ]
+ },
+ "execution_result": {
+ "result": {
+ "reason": "The agent successfully identified the compatibility mode of the Word document.",
+ "sub_scores": {
+ "correct identification of compatibility mode": "yes"
+ },
+ "complete": "yes"
+ },
+ "error": null
+ },
+ "instantiation_result": {
+ "choose_template": {
+ "result": "dataflow\\results\\saved_document\\102.docx",
+ "error": null
+ },
+ "prefill": {
+ "result": {
+ "instantiated_request": "Identify the Compatibility Mode of the Word document.",
+ "instantiated_plan": [
+ {
+ "Step": 1,
+ "Subtask": "Identify the Compatibility Mode",
+ "Function": "summary",
+ "Args": {
+ "text": "The document is in '102 - Compatibility Mode'."
+ },
+ "Success": true
+ }
+ ]
+ },
+ "error": null
+ },
+ "instantiation_evaluation": {
+ "result": {
+ "judge": true,
+ "thought": "Identifying the Compatibility Mode of a Word document is a task that can be executed locally within Word."
+ },
+ "error": null
+ }
+ },
+ "time_cost": {
+ "choose_template": 0.017,
+ "prefill": 11.304,
+ "instantiation_evaluation": 2.38,
+ "total": 34.584,
+ "execute": 0.946,
+ "execute_eval": 10.381
+ }
+}
+
+We prepare two cases to show the dataflow, which can be found in dataflow\tasks\prefill
. So after installing required packages, you can type the following command in the command line:
python -m dataflow -dataflow
+
+And you can see the hints showing in the terminal, which means the dataflow is working.
+After the two tasks are finished, the task and output files would appear as follows:
+UFO/
+├── dataflow/
+│ └── results/
+│ ├── saved_document/ # Directory for saved documents
+│ │ ├── bulleted.docx # Result of the "bulleted" task
+│ │ └── rotate.docx # Result of the "rotate" task
+│ ├── dataflow/ # Dataflow results directory
+│ │ ├── execution_pass/ # Successfully executed tasks
+│ │ │ ├── bulleted.json # Execution result for the "bulleted" task
+│ │ │ ├── rotate.json # Execution result for the "rotate" task
+│ │ │ └── ...
+└── ...
+
+The result stucture of bulleted task is shown as below. This document provides a detailed breakdown of the task execution process for turning lines of text into a bulleted list in Word. It includes the original task description, execution results, and time analysis for each step.
+unique_id
: The identifier for the task, in this case, "5"
.app
: The application being used, which is "word"
.original
: Contains the original task description and the steps.
original_task
: Describes the task in simple terms (turning text into a bulleted list).
original_steps
: Lists the steps required to perform the task.execution_result
: Provides the result of executing the task.
result
: Describes the outcome of the execution, including a success message and sub-scores for each part of the task. The complete: "yes"
means the evaluation agent think the execution process is successful! The sub_score
is the evaluation of each subtask, corresponding to the instantiated_plan
in the prefill
.
error
: If any error occurred during execution, it would be reported here, but it's null
in this case.instantiation_result
: Details the instantiation of the task (setting up the task for execution).
choose_template
: Path to the template or document created during the task (in this case, the bulleted list document).
prefill
: Describes the instantiated_request
and instantiated_plan
and the steps involved, such as selecting text and clicking buttons, which is the result of prefill flow. The Success
and MatchedControlText
is added in the execution process. Success
indicates whether the subtask was executed successfully. MatchedControlText
refers to the control text that was matched during the execution process based on the plan.instantiation_evaluation
: Provides feedback on the task's feasibility and the evaluation of the request, which is result of the filter flow. "judge": true
: This indicates that the evaluation of the task was positive, meaning the task is considered valid or successfully judged. And the thought
is the detailed reason.time_cost
: The time spent on different parts of the task, including template selection, prefill, instantiation evaluation, and execution. Total time is also given.This structure follows your description and provides the necessary details in a consistent format.
+{
+ "unique_id": "5",
+ "app": "word",
+ "original": {
+ "original_task": "Turning lines of text into a bulleted list in Word",
+ "original_steps": [
+ "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+ "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+ ]
+ },
+ "execution_result": {
+ "result": {
+ "reason": "The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.",
+ "sub_scores": {
+ "text selection": "yes",
+ "bulleted list conversion": "yes"
+ },
+ "complete": "yes"
+ },
+ "error": null
+ },
+ "instantiation_result": {
+ "choose_template": {
+ "result": "dataflow\\results\\saved_document\\bulleted.docx",
+ "error": null
+ },
+ "prefill": {
+ "result": {
+ "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+ "instantiated_plan": [
+ {
+ "Step": 1,
+ "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+ "ControlLabel": null,
+ "ControlText": "",
+ "Function": "select_text",
+ "Args": {
+ "text": "text to edit"
+ },
+ "Success": true,
+ "MatchedControlText": null
+ },
+ {
+ "Step": 2,
+ "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+ "ControlLabel": "61",
+ "ControlText": "Bullets",
+ "Function": "click_input",
+ "Args": {
+ "button": "left",
+ "double": false
+ },
+ "Success": true,
+ "MatchedControlText": "Bullets"
+ }
+ ]
+ },
+ "error": null
+ },
+ "instantiation_evaluation": {
+ "result": {
+ "judge": true,
+ "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+ "request_type": "None"
+ },
+ "error": null
+ }
+ },
+ "time_cost": {
+ "choose_template": 0.012,
+ "prefill": 15.649,
+ "instantiation_evaluation": 2.469,
+ "execute": 5.824,
+ "execute_eval": 8.702,
+ "total": 43.522
+ }
+}
+
+The corresponding logs can be found in the directories logs/bulleted
and logs/rotate
, as shown below. Detailed logs for each workflow are recorded, capturing every step of the execution process.
+ Bases: Enum
Enum class for applications.
+ +Initialize the application enum.
+ + +Parameters: | +
+
|
+
---|
dataflow/data_flow_controller.py
47 +48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 |
|
Initialize the task object.
+ + +Parameters: | +
+
|
+
---|
dataflow/data_flow_controller.py
64 +65 +66 +67 +68 +69 +70 +71 +72 +73 +74 +75 +76 +77 +78 |
|
Flow controller class to manage the instantiation and execution process.
+ +Initialize the flow controller.
+ + +Parameters: | +
+
|
+
---|
dataflow/data_flow_controller.py
116 +117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 |
|
instantiated_plan: List[Dict[str, Any]]
+
+
+ property
+ writable
+
+
+Get the instantiated plan from the task information.
+ + +Returns: | +
+
|
+
---|
template_copied_path: str
+
+
+ property
+
+
+Get the copied template path from the task information.
+ + +Returns: | +
+
|
+
---|
execute_execution(request, plan)
+
+Execute the execution process.
+ + +Parameters: | +
+
|
+
---|
dataflow/data_flow_controller.py
205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 +223 +224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 |
|
execute_instantiation()
+
+Execute the instantiation process.
+ + +Returns: | +
+
|
+
---|
dataflow/data_flow_controller.py
173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 |
|
init_task_info()
+
+Initialize the task information.
+ + +Returns: | +
+
|
+
---|
dataflow/data_flow_controller.py
134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 +145 +146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 |
|
instantiation_single_flow(flow_class, flow_type, init_params=None, execute_params=None)
+
+Execute a single flow process in the instantiation phase.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
dataflow/data_flow_controller.py
252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 |
|
run()
+
+Run the instantiation and execution process.
+ +dataflow/data_flow_controller.py
360 +361 +362 +363 +364 +365 +366 +367 +368 +369 +370 +371 +372 +373 +374 +375 +376 +377 +378 +379 +380 +381 +382 +383 +384 +385 +386 +387 +388 +389 |
|
save_result()
+
+Validate and save the instantiated task result.
+ +dataflow/data_flow_controller.py
287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 +311 +312 +313 +314 +315 +316 +317 +318 +319 +320 +321 +322 +323 +324 +325 +326 +327 +328 +329 +330 |
|
Note
+This schema defines the structure of a JSON object that might be used to represent the results of task instantiation
.
unique_id
: A string serving as the unique identifier for the task.app
: A string representing the application where the task is being executed.original
: An object containing details about the original task.unique_id
string
Purpose: Provides a globally unique identifier for the task.
+app
string
Purpose: Specifies the application associated with the task execution.
+original
object
Contains the following fields:
+original_task
: string
original_steps
: array
of string
Required fields: original_task
, original_steps
execution_result
object
or null
result
: Always null
, indicating no execution results are included. error
: Always null
, implying execution errors are not tracked in this schema. Purpose: Simplifies the structure by omitting detailed execution results.
+instantiation_result
object
Contains fields detailing the results of task instantiation:
+choose_template
: object
result
: A string or null
, representing the outcome of template selection. error
: A string or null
, detailing any errors during template selection. result
, error
prefill
: object
or null
result
: object
or null
instantiated_request
: A string, representing the generated request. instantiated_plan
: An array or null
, listing instantiation steps:Step
: An integer representing the sequence of the step. Subtask
: A string describing the subtask. ControlLabel
: A string or null
, representing the control label. ControlText
: A string, providing context for the step. Function
: A string, specifying the function executed at this step. Args
: An object, containing any arguments required by the function. Step
, Subtask
, Function
, Args
instantiated_request
, instantiated_plan
error
: A string or null
, describing errors encountered during prefill. result
, error
instantiation_evaluation
: object
result
: object
or null
judge
: A boolean, indicating whether the instantiation is valid. thought
: A string, providing reasoning or observations. request_type
: A string, classifying the request type. judge
, thought
, request_type
error
: A string or null
, indicating errors during evaluation. result
, error
time_cost
object
choose_template
: A number or null
, time spent selecting a template. prefill
: A number or null
, time used for pre-filling. instantiation_evaluation
: A number or null
, time spent on evaluation. total
: A number or null
, total time cost for all processes. choose_template
, prefill
, instantiation_evaluation
, total
{
+ "unique_id": "5",
+ "app": "word",
+ "original": {
+ "original_task": "Turning lines of text into a bulleted list in Word",
+ "original_steps": [
+ "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+ "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+ ]
+ },
+ "execution_result": {
+ "result": null,
+ "error": null
+ },
+ "instantiation_result": {
+ "choose_template": {
+ "result": "dataflow\\results\\saved_document\\bulleted.docx",
+ "error": null
+ },
+ "prefill": {
+ "result": {
+ "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+ "instantiated_plan": [
+ {
+ "Step": 1,
+ "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+ "ControlLabel": null,
+ "ControlText": "",
+ "Function": "select_text",
+ "Args": {
+ "text": "text to edit"
+ }
+ },
+ {
+ "Step": 2,
+ "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+ "ControlLabel": null,
+ "ControlText": "Bullets",
+ "Function": "click_input",
+ "Args": {
+ "button": "left",
+ "double": false
+ }
+ }
+ ]
+ },
+ "error": null
+ },
+ "instantiation_evaluation": {
+ "result": {
+ "judge": true,
+ "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+ "request_type": "None"
+ },
+ "error": null
+ }
+ },
+ "time_cost": {
+ "choose_template": 0.012,
+ "prefill": 15.649,
+ "instantiation_evaluation": 2.469,
+ "execute": null,
+ "execute_eval": null,
+ "total": 18.130
+ }
+}
+
+This schema defines the structure of a JSON object that might be used to represent the results of task execution
or dataflow
. Below are the main fields and their detailed descriptions.
Unlike the instantiation result, the execution result schema provides detailed feedback on execution, including success metrics (reason
, sub_scores
). Additionally, based on the original instantiated_plan, each step has been enhanced with the fields Success
and MatchedControlText
, which represent whether the step executed successfully (success is indicated by no errors) and the name of the last matched control, respectively. The ControlLabel
will also be updated to reflect the final selected ControlLabel.
unique_id
Type: string
app
Type: string
original
Type: object
original_task
:string
original_steps
:array
execution_result
Type: object
or null
result
:object
or null
reason
: The reason for the execution result, type string
.sub_scores
: A set of sub-scores, represented as key-value pairs (.*
allows any key pattern).complete
: Indicates the completion status, type string
.error
:object
or null
type
: The type of error, type string
.message
: The error message, type string
.traceback
: The error traceback, type string
.instantiation_result
Type: object
choose_template
:object
result
: The result of template selection, type string
or null
.error
: Error information, type null
or string
.prefill
:object
or null
result
:object
or null
instantiated_request
: The instantiated task request, type string
.instantiated_plan
: The instantiated task plan, type array
or null
.Step
: Step number, type integer
.Subtask
: Description of the subtask, type string
.ControlLabel
: Control label, type string
or null
.ControlText
: Control text, type string
.Function
: Function name, type string
.Args
: Arguments to the function, type object
.Success
: Whether the step succeeded, type boolean
or null
.MatchedControlText
: Matched control text, type string
or null
.error
: Prefill error information, type null
or string
.instantiation_evaluation
:object
result
:object
or null
judge
: Whether the evaluation succeeded, type boolean
.thought
: Evaluator's thoughts, type string
.request_type
: The type of request, type string
.error
: Evaluation error information, type null
or string
.time_cost
Type: object
choose_template
: Time spent selecting the template, type number
or null
.prefill
: Time spent in the prefill phase, type number
or null
.instantiation_evaluation
: Time spent in instantiation evaluation, type number
or null
.total
: Total time cost, type number
or null
.execute
: Time spent in execution, type number
or null
.execute_eval
: Time spent in execution evaluation, type number
or null
.The fields unique_id
, app
, original
, execution_result
, instantiation_result
, and time_cost
are required for the JSON object to be valid.
{
+ "unique_id": "5",
+ "app": "word",
+ "original": {
+ "original_task": "Turning lines of text into a bulleted list in Word",
+ "original_steps": [
+ "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+ "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+ ]
+ },
+ "execution_result": {
+ "result": {
+ "reason": "The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.",
+ "sub_scores": {
+ "text selection": "yes",
+ "bulleted list conversion": "yes"
+ },
+ "complete": "yes"
+ },
+ "error": null
+ },
+ "instantiation_result": {
+ "choose_template": {
+ "result": "dataflow\\results\\saved_document\\bulleted.docx",
+ "error": null
+ },
+ "prefill": {
+ "result": {
+ "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+ "instantiated_plan": [
+ {
+ "Step": 1,
+ "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+ "ControlLabel": null,
+ "ControlText": "",
+ "Function": "select_text",
+ "Args": {
+ "text": "text to edit"
+ },
+ "Success": true,
+ "MatchedControlText": null
+ },
+ {
+ "Step": 2,
+ "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+ "ControlLabel": "61",
+ "ControlText": "Bullets",
+ "Function": "click_input",
+ "Args": {
+ "button": "left",
+ "double": false
+ },
+ "Success": true,
+ "MatchedControlText": "Bullets"
+ }
+ ]
+ },
+ "error": null
+ },
+ "instantiation_evaluation": {
+ "result": {
+ "judge": true,
+ "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+ "request_type": "None"
+ },
+ "error": null
+ }
+ },
+ "time_cost": {
+ "choose_template": 0.012,
+ "prefill": 15.649,
+ "instantiation_evaluation": 2.469,
+ "execute": 5.824,
+ "execute_eval": 8.702,
+ "total": 43.522
+ }
+}
+
+
+ WindowsAppEnv
class represents the environment for controlling a Windows application. It provides methods for starting, stopping, and interacting with Windows applications, including window matching based on configurable strategies.
In the WindowsAppEnv
class, matching strategies are rules that determine how to match window
or control
names with a given document name or target text. Based on the configuration file, three different matching strategies can be selected: contains
, fuzzy
, and regex
.
Contains
Matching is the simplest strategy, suitable when the window and document names match exactly.Fuzzy
Matching is more flexible and can match even when there are spelling errors or partial matches between the window title and document name.s
Matching offers the most flexibility, ideal for complex matching patterns in window titles.The method find_matching_window
is responsible for matching windows based on the configured matching strategy. Here's how you can use it to find a window by providing a document name:
# Initialize your application object (assuming app_object is already defined)
+app_env = WindowsAppEnv(app_object)
+
+# Define the document name you're looking for
+doc_name = "example_document_name"
+
+# Call find_matching_window to find the window that matches the document name
+matching_window = app_env.find_matching_window(doc_name)
+
+if matching_window:
+ print(f"Found matching window: {matching_window.element_info.name}")
+else:
+ print("No matching window found.")
+
+app_env.find_matching_window(doc_name)
will search through all open windows and match the window title using the strategy defined in the configuration (contains, fuzzy, or regex).matching_window
object will contain the matched window, and you can print the window's name.None
.To find a matching control within a window, you can use the find_matching_controller
method. This method requires a dictionary of filtered controls and a control text to match against.
# Initialize your application object (assuming app_object is already defined)
+app_env = WindowsAppEnv(app_object)
+
+# Define a filtered annotation dictionary of controls (control_key, control_object)
+# Here, we assume you have a dictionary of UIAWrapper controls from a window.
+filtered_annotation_dict = {
+ 1: some_control_1, # Example control objects
+ 2: some_control_2, # Example control objects
+}
+
+# Define the control text you're searching for
+control_text = "submit_button"
+
+# Call find_matching_controller to find the best match
+controller_key, control_selected = app_env.find_matching_controller(filtered_annotation_dict, control_text)
+
+if control_selected:
+ print(f"Found matching control with key {controller_key}: {control_selected.window_text()}")
+else:
+ print("No matching control found.")
+
+filtered_annotation_dict
is a dictionary where the key represents the control's ID and the value is the control object (UIAWrapper
).control_text
is the text you're searching for within those controls.app_env.find_matching_controller(filtered_annotation_dict, control_text)
will calculate the matching score for each control based on the defined strategy and return the control with the highest match score.control_selected
) and its key (controller_key
), which can be used for further interaction.Represents the Windows Application Environment.
+ +Initializes the Windows Application Environment.
+ + +Parameters: | +
+
|
+
---|
env/env_manager.py
29 +30 +31 +32 +33 +34 +35 +36 +37 +38 |
|
close()
+
+Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process.
+ +env/env_manager.py
57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 |
|
find_matching_controller(filtered_annotation_dict, control_text)
+
+" +Select the best matched controller.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
env/env_manager.py
156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 |
|
find_matching_window(doc_name)
+
+Finds a matching window based on the process name and the configured matching strategy.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
env/env_manager.py
90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 +100 +101 +102 +103 +104 |
|
start(copied_template_path)
+
+Starts the Windows environment.
+ + +Parameters: | +
+
|
+
---|
env/env_manager.py
40 +41 +42 +43 +44 +45 +46 +47 +48 +49 +50 +51 +52 +53 +54 +55 |
|
We provide answers to some frequently asked questions about the UFO.
+A: UFO stands for UI Focused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic.
+A: UFO is currently only supported on Windows OS.
+A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency.
+A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models
section of the documentation.
A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE
to False
in the config.yaml
file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance.
A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models
section for more details.
A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages.
+Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
?A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint.
+Info
+To get more support, please submit an issue on the GitHub Issues, or send an email to ufo-agent@microsoft.com.
+If you are a user of UFO, and want to use it to automate your tasks on Windows, you can refer to User Configuration to set up your environment and start using UFO.
+For instance, except for configuring the HOST_AGENT
and APP_AGENT
, you can also configure the LLM parameters and RAG parameters in the config.yaml
file to enhance the UFO agent with additional knowledge sources.
If you are a developer who wants to contribute to UFO, you can take a look at the Developer Configuration to explore the development environment setup and the development workflow.
+You can also refer to the Project Structure to understand the project structure and the role of each component in UFO, and use the rest of the documentation to understand the architecture and design of UFO. Taking a look at the Session and Round can help you understand the core logic of UFO.
+For debugging and testing, it is recommended to check the log files in the ufo/logs
directory to track the execution of UFO and identify any issues that may arise.
UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:
+# [optional to create conda environment]
+# conda create -n ufo python=3.10
+# conda activate ufo
+
+# clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+# install the requirements
+pip install -r requirements.txt
+# If you want to use the Qwen as your LLMs, uncomment the related libs.
+
+Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent. You can create your own config file ufo/config/config.yaml
, by copying the ufo/config/config.yaml.template
and editing config for APP_AGENT and ACTION_AGENT as follows:
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
+API_KEY: "sk-", # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The OpenAI model
+
+VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
+API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY", # The aoai API key
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The OpenAI model
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False
and proper API_MODEL
(openai) and API_DEPLOYMENT_ID
(aoai). You can also optionally set an backup LLM engine in the field of BACKUP_AGENT
if the above engines failed during the inference. The API_MODEL
can be any GPT models that can accept images as input.
You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml
file:
Info
+VISUAL_MODE: False
API_MODEL
(OpenAI) and API_DEPLOYMENT_ID
(AOAI) for each agent.Optionally, you can set a backup language model (LLM) engine in the BACKUP_AGENT
field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.
Note
+UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the documents for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in config_dev.yaml
.
If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml
file.
We provide the following options for RAG to enhance UFO's capabilities:
+Offline Help Document: Enable UFO to retrieve information from offline help documents.
+Online Bing Search Engine: Enhance UFO's capabilities by utilizing the most up-to-date online search results.
+Self-Experience: Save task completion trajectories into UFO's memory for future reference.
+User-Demonstration: Boost UFO's capabilities through user demonstration.
+Tip
+Consult their respective documentation for more information on how to configure these settings.
+# assume you are in the cloned UFO folder
+python -m ufo --task <your_task_name>
+
+This will start the UFO process and you can interact with it through the command line interface. +If everything goes well, you will see the following message:
+Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
+ _ _ _____ ___
+| | | || ___| / _ \
+| | | || |_ | | | |
+| |_| || _| | |_| |
+ \___/ |_| \___/
+Please enter your request to be completed🛸:
+
+You can find the screenshots taken and request & response logs in the following folder:
+./ufo/logs/<your_task_name>/
+
+You may use them to debug, replay, or analyze the agent output.
+Note
+Before UFO executing your request, please make sure the targeted applications are active on the system.
+Note
+The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to DISCLAIMER.md.
+UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.
+UFO operates as a multi-agent framework, encompassing:
+HostAgent 🤖, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.
+AppAgent 👾, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.
+Application Automator 🎮, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here.
+Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report.
+Please follow the Quick Start Guide to get started with UFO.
+Check out our official deep dive of UFO on this Youtube Video.
+UFO sightings have garnered attention from various media outlets, including:
+Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience
+🚀 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo🌌
++
Our technical report paper can be found here. Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. +If you use UFO in your research, please cite our paper:
+@article{ufo,
+ title={{UFO: A UI-Focused Agent for Windows OS Interaction}},
+ author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
+ journal={arXiv preprint arXiv:2402.07939},
+ year={2024}
+}
+
+If you're interested in data analytics agent frameworks, check out TaskWeaver, a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks.
+For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey. You can also explore the survey through: +- GitHub Repository +- Searchable Website
+ + + + +The evaluation logs store the evaluation results from the EvaluationAgent
. The evaluation log contains the following information:
Field | +Description | +Type | +
---|---|---|
Reason | +The detailed reason for your judgment, by observing the screenshot differences and the |
+String | +
Sub-score | +The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. | +List of Dictionaries | +
Complete | +The completion status of the evaluation, can be yes , no , or unsure . |
+String | +
level | +The level of the evaluation. | +String | +
request | +The request sent to the EvaluationAgent . |
+Dictionary | +
id | +The ID of the evaluation. | +Integer | +
Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO:
+Log Type | +Description | +Location | +Level | +
---|---|---|---|
Request Log | +Contains the prompt requests to LLMs. | +logs/{task_name}/request.log |
+Info | +
Step Log | +Contains the agent's response to the user's request and additional information at every step. | +logs/{task_name}/response.log |
+Info | +
Evaluation Log | +Contains the evaluation results from the EvaluationAgent . |
+logs/{task_name}/evaluation.log |
+Info | +
Screenshots | +Contains the screenshots of the application UI. | +logs/{task_name}/ |
+- | +
All logs are stored in the logs/{task_name}
directory.
The request is the prompt requests to the LLMs. The request log is stored in the request.log
file. The request log contains the following information for each step:
Field | +Description | +
---|---|
step |
+The step number of the session. | +
prompt |
+The prompt message sent to the LLMs. | +
The request log is stored at the debug
level. You can configure the logging level in the LOG_LEVEL
field in the config_dev.yaml
file.
Tip
+You can use the following python code to read the request log:
+import json
+
+with open('logs/{task_name}/request.log', 'r') as f:
+ for line in f:
+ log = json.loads(line)
+
+UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the logs/{task_name}/
.
There are 4 types of screenshot logs generated by UFO, as detailed below.
+At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the action_step{step_number}.png
file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the action_round_{round_id}_sub_round_{sub_task_id}_final.png
, action_round_{round_id}_final.png
and action_step_final.png
files, respectively. Below is an example of a clean screenshot.
UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the Set-of-Mark paradigm. The annotated screenshots are saved in the action_step{step_number}_annotated.png
file. Below is an example of an annotated screenshot.
Info
+Only selected types of controls are annotated in the screenshots. They are configured in the config_dev.yaml
file under the CONTROL_LIST
field.
Tip
+Different types of controls are annotated with different colors. You can configure the colors in the config_dev.yaml
file under the ANNOTATION_COLORS
field.
UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the action_step{step_number}_concat.png
file. Below is an example of a concatenated screenshot.
Info
+You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the config_dev.yaml
file under the CONCAT_SCREENSHOT
field.
UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the action_step{step_number}_selected_controls.png
file. Below is an example of a selected control screenshot.
Info
+You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the config_dev.yaml
file under the INCLUDE_LAST_SCREENSHOT
field.
The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the response.log
file. The log fields are different for HostAgent
and AppAgent
. The step log is at the info
level.
The HostAgent
logs contain the following fields:
Field | +Description | +Type | +
---|---|---|
Observation | +The observation of current desktop screenshots. | +String | +
Thought | +The logical reasoning process of the HostAgent . |
+String | +
Current Sub-Task | +The current sub-task to be executed by the AppAgent . |
+String | +
Message | +The message to be sent to the AppAgent for the completion of the sub-task. |
+String | +
ControlLabel | +The index of the selected application to execute the sub-task. | +String | +
ControlText | +The name of the selected application to execute the sub-task. | +String | +
Plan | +The plan for the following sub-tasks after the current sub-task. | +List of Strings | +
Status | +The status of the agent, mapped to the AgentState . |
+String | +
Comment | +Additional comments or information provided to the user. | +String | +
Questions | +The questions to be asked to the user for additional information. | +List of Strings | +
Bash | +The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. |
+String | +
Field | +Description | +Type | +
---|---|---|
Step | +The step number of the session. | +Integer | +
RoundStep | +The step number of the current round. | +Integer | +
AgentStep | +The step number of the HostAgent . |
+Integer | +
Round | +The round number of the session. | +Integer | +
ControlLabel | +The index of the selected application to execute the sub-task. | +Integer | +
ControlText | +The name of the selected application to execute the sub-task. | +String | +
Request | +The user request. | +String | +
Agent | +The agent that executed the step, set to HostAgent . |
+String | +
AgentName | +The name of the agent. | +String | +
Application | +The application process name. | +String | +
Cost | +The cost of the step. | +Float | +
Results | +The results of the step, set to an empty string. | +String | +
CleanScreenshot | +The image path of the desktop screenshot. | +String | +
AnnotatedScreenshot | +The image path of the annotated application screenshot. | +String | +
ConcatScreenshot | +The image path of the concatenated application screenshot. | +String | +
SelectedControlScreenshot | +The image path of the selected control screenshot. | +String | +
time_cost | +The time cost of each step in the process. | +Dictionary | +
The AppAgent
logs contain the following fields:
Field | +Description | +Type | +
---|---|---|
Observation | +The observation of the current application screenshots. | +String | +
Thought | +The logical reasoning process of the AppAgent . |
+String | +
ControlLabel | +The index of the selected control to interact with. | +String | +
ControlText | +The name of the selected control to interact with. | +String | +
Function | +The function to be executed on the selected control. | +String | +
Args | +The arguments required for the function execution. | +List of Strings | +
Status | +The status of the agent, mapped to the AgentState . |
+String | +
Plan | +The plan for the following steps after the current action. | +List of Strings | +
Comment | +Additional comments or information provided to the user. | +String | +
SaveScreenshot | +The flag to save the screenshot of the application to the blackboard for future reference. |
+Boolean | +
Field | +Description | +Type | +
---|---|---|
Step | +The step number of the session. | +Integer | +
RoundStep | +The step number of the current round. | +Integer | +
AgentStep | +The step number of the AppAgent . |
+Integer | +
Round | +The round number of the session. | +Integer | +
Subtask | +The sub-task to be executed by the AppAgent . |
+String | +
SubtaskIndex | +The index of the sub-task in the current round. | +Integer | +
Action | +The action to be executed by the AppAgent . |
+String | +
ActionType | +The type of the action to be executed. | +String | +
Request | +The user request. | +String | +
Agent | +The agent that executed the step, set to AppAgent . |
+String | +
AgentName | +The name of the agent. | +String | +
Application | +The application process name. | +String | +
Cost | +The cost of the step. | +Float | +
Results | +The results of the step. | +String | +
CleanScreenshot | +The image path of the desktop screenshot. | +String | +
AnnotatedScreenshot | +The image path of the annotated application screenshot. | +String | +
ConcatScreenshot | +The image path of the concatenated application screenshot. | +String | +
time_cost | +The time cost of each step in the process. | +Dictionary | +
Tip
+You can use the following python code to read the request log:
+import json
+
+with open('logs/{task_name}/request.log', 'r') as f:
+ for line in f:
+ log = json.loads(line)
+
+Info
+The FollowerAgent
logs share the same fields as the AppAgent
logs.
UFO can save the entire UI tree of the application window at every step for data collection purposes. The UI tree can represent the application's UI structure, including the window, controls, and their properties. The UI tree logs are saved in the logs/{task_name}/ui_tree
folder. You have to set the SAVE_UI_TREE
flag to True
in the config_dev.yaml
file to enable the UI tree logs. Below is an example of the UI tree logs for application:
{
+ "id": "node_0",
+ "name": "Mail - Chaoyun Zhang - Outlook",
+ "control_type": "Window",
+ "rectangle": {
+ "left": 628,
+ "top": 258,
+ "right": 3508,
+ "bottom": 1795
+ },
+ "adjusted_rectangle": {
+ "left": 0,
+ "top": 0,
+ "right": 2880,
+ "bottom": 1537
+ },
+ "relative_rectangle": {
+ "left": 0.0,
+ "top": 0.0,
+ "right": 1.0,
+ "bottom": 1.0
+ },
+ "level": 0,
+ "children": [
+ {
+ "id": "node_1",
+ "name": "",
+ "control_type": "Pane",
+ "rectangle": {
+ "left": 3282,
+ "top": 258,
+ "right": 3498,
+ "bottom": 330
+ },
+ "adjusted_rectangle": {
+ "left": 2654,
+ "top": 0,
+ "right": 2870,
+ "bottom": 72
+ },
+ "relative_rectangle": {
+ "left": 0.9215277777777777,
+ "top": 0.0,
+ "right": 0.9965277777777778,
+ "bottom": 0.0468445022771633
+ },
+ "level": 1,
+ "children": []
+ }
+ ]
+}
+
+Below is a table of the fields in the UI tree logs:
+Field | +Description | +Type | +
---|---|---|
id | +The unique identifier of the UI tree node. | +String | +
name | +The name of the UI tree node. | +String | +
control_type | +The type of the UI tree node. | +String | +
rectangle | +The absolute position of the UI tree node. | +Dictionary | +
adjusted_rectangle | +The adjusted position of the UI tree node. | +Dictionary | +
relative_rectangle | +The relative position of the UI tree node. | +Dictionary | +
level | +The level of the UI tree node. | +Integer | +
children | +The children of the UI tree node. | +List of UI tree nodes | +
A class to represent the UI tree.
+ +Initialize the UI tree with the root element.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/ui_tree.py
20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +30 +31 +32 +33 |
|
ui_tree: Dict[str, Any]
+
+
+ property
+
+
+The UI tree.
+apply_ui_tree_diff(ui_tree_1, diff)
+
+
+ staticmethod
+
+
+Apply a UI tree diff to ui_tree_1 to get ui_tree_2.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/ui_tree.py
224 +225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 +236 +237 +238 +239 +240 +241 +242 +243 +244 +245 +246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 +311 +312 +313 +314 +315 +316 +317 +318 +319 +320 +321 +322 +323 |
|
flatten_ui_tree()
+
+Flatten the UI tree into a list in width-first order.
+ +automator/ui_control/ui_tree.py
117 +118 +119 +120 +121 +122 +123 +124 +125 +126 +127 +128 +129 +130 +131 +132 +133 +134 +135 +136 +137 +138 +139 +140 +141 +142 +143 +144 |
|
save_ui_tree_to_json(file_path)
+
+Save the UI tree to a JSON file.
+ + +Parameters: | +
+
|
+
---|
automator/ui_control/ui_tree.py
103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 |
|
ui_tree_diff(ui_tree_1, ui_tree_2)
+
+
+ staticmethod
+
+
+Compute the difference between two UI trees.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
automator/ui_control/ui_tree.py
146 +147 +148 +149 +150 +151 +152 +153 +154 +155 +156 +157 +158 +159 +160 +161 +162 +163 +164 +165 +166 +167 +168 +169 +170 +171 +172 +173 +174 +175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 +192 +193 +194 +195 +196 +197 +198 +199 +200 +201 +202 +203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 +219 +220 +221 +222 |
|
Note
+Save the UI tree logs may increase the latency of the system. It is recommended to set the SAVE_UI_TREE
flag to False
when you do not need the UI tree logs.
The Context
object is a shared state object that stores the state of the conversation across all Rounds
within a Session
. It is used to maintain the context of the conversation, as well as the overall status of the conversation.
The attributes of the Context
object are defined in the ContextNames
class, which is an Enum
. The ContextNames
class specifies various context attributes used throughout the session. Below is the definition:
class ContextNames(Enum):
+ """
+ The context names.
+ """
+
+ ID = "ID" # The ID of the session
+ MODE = "MODE" # The mode of the session
+ LOG_PATH = "LOG_PATH" # The folder path to store the logs
+ REQUEST = "REQUEST" # The current request
+ SUBTASK = "SUBTASK" # The current subtask processed by the AppAgent
+ PREVIOUS_SUBTASKS = "PREVIOUS_SUBTASKS" # The previous subtasks processed by the AppAgent
+ HOST_MESSAGE = "HOST_MESSAGE" # The message from the HostAgent sent to the AppAgent
+ REQUEST_LOGGER = "REQUEST_LOGGER" # The logger for the LLM request
+ LOGGER = "LOGGER" # The logger for the session
+ EVALUATION_LOGGER = "EVALUATION_LOGGER" # The logger for the evaluation
+ ROUND_STEP = "ROUND_STEP" # The step of all rounds
+ SESSION_STEP = "SESSION_STEP" # The step of the current session
+ CURRENT_ROUND_ID = "CURRENT_ROUND_ID" # The ID of the current round
+ APPLICATION_WINDOW = "APPLICATION_WINDOW" # The window of the application
+ APPLICATION_PROCESS_NAME = "APPLICATION_PROCESS_NAME" # The process name of the application
+ APPLICATION_ROOT_NAME = "APPLICATION_ROOT_NAME" # The root name of the application
+ CONTROL_REANNOTATION = "CONTROL_REANNOTATION" # The re-annotation of the control provided by the AppAgent
+ SESSION_COST = "SESSION_COST" # The cost of the session
+ ROUND_COST = "ROUND_COST" # The cost of all rounds
+ ROUND_SUBTASK_AMOUNT = "ROUND_SUBTASK_AMOUNT" # The amount of subtasks in all rounds
+ CURRENT_ROUND_STEP = "CURRENT_ROUND_STEP" # The step of the current round
+ CURRENT_ROUND_COST = "CURRENT_ROUND_COST" # The cost of the current round
+ CURRENT_ROUND_SUBTASK_AMOUNT = "CURRENT_ROUND_SUBTASK_AMOUNT" # The amount of subtasks in the current round
+ STRUCTURAL_LOGS = "STRUCTURAL_LOGS" # The structural logs of the session
+
+Each attribute is a string that represents a specific aspect of the session context, ensuring that all necessary information is accessible and manageable within the application.
+Attribute | +Description | +
---|---|
ID |
+The ID of the session. | +
MODE |
+The mode of the session. | +
LOG_PATH |
+The folder path to store the logs. | +
REQUEST |
+The current request. | +
SUBTASK |
+The current subtask processed by the AppAgent. | +
PREVIOUS_SUBTASKS |
+The previous subtasks processed by the AppAgent. | +
HOST_MESSAGE |
+The message from the HostAgent sent to the AppAgent. | +
REQUEST_LOGGER |
+The logger for the LLM request. | +
LOGGER |
+The logger for the session. | +
EVALUATION_LOGGER |
+The logger for the evaluation. | +
ROUND_STEP |
+The step of all rounds. | +
SESSION_STEP |
+The step of the current session. | +
CURRENT_ROUND_ID |
+The ID of the current round. | +
APPLICATION_WINDOW |
+The window of the application. | +
APPLICATION_PROCESS_NAME |
+The process name of the application. | +
APPLICATION_ROOT_NAME |
+The root name of the application. | +
CONTROL_REANNOTATION |
+The re-annotation of the control provided by the AppAgent. | +
SESSION_COST |
+The cost of the session. | +
ROUND_COST |
+The cost of all rounds. | +
ROUND_SUBTASK_AMOUNT |
+The amount of subtasks in all rounds. | +
CURRENT_ROUND_STEP |
+The step of the current round. | +
CURRENT_ROUND_COST |
+The cost of the current round. | +
CURRENT_ROUND_SUBTASK_AMOUNT |
+The amount of subtasks in the current round. | +
STRUCTURAL_LOGS |
+The structural logs of the session. | +
Context
objectThe context class that maintains the context for the session and agent.
+ + + + + + + + + +current_round_cost: Optional[float]
+
+
+ property
+ writable
+
+
+Get the current round cost.
+current_round_step: int
+
+
+ property
+ writable
+
+
+Get the current round step.
+current_round_subtask_amount: int
+
+
+ property
+ writable
+
+
+Get the current round subtask index.
+add_to_structural_logs(data)
+
+Add data to the structural logs.
+ + +Parameters: | +
+
|
+
---|
module/context.py
274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 +288 +289 |
|
filter_structural_logs(round_key, subtask_key, keys)
+
+Filter the structural logs.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
module/context.py
291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 +311 |
|
get(key)
+
+Get the value from the context.
+ + +Parameters: | +
+
|
+
---|
Returns: | +
+
|
+
---|
module/context.py
165 +166 +167 +168 +169 +170 +171 +172 +173 |
|
set(key, value)
+
+Set the value in the context.
+ + +Parameters: | +
+
|
+
---|
module/context.py
175 +176 +177 +178 +179 +180 +181 +182 +183 +184 +185 +186 +187 +188 +189 +190 +191 |
|
to_dict()
+
+Convert the context to a dictionary.
+ + +Returns: | +
+
|
+
---|
module/context.py
313 +314 +315 +316 +317 +318 |
|
update_dict(key, value)
+
+Add a dictionary to a context key. The value and the context key should be dictionaries.
+ + +Parameters: | +
+
|
+
---|
module/context.py
203 +204 +205 +206 +207 +208 +209 +210 +211 +212 +213 +214 +215 +216 +217 +218 |
|
A Round
is a single interaction between the user and UFO that processes a single user request. A Round
is responsible for orchestrating the HostAgent
and AppAgent
to fulfill the user's request.
In a Round
, the following steps are executed:
At the beginning of a Round
, the Round
object is created, and the user's request is processed by the HostAgent
to determine the appropriate application to fulfill the request.
Once created, the Round
orchestrates the HostAgent
and AppAgent
to execute the necessary actions to fulfill the user's request. The core logic of a Round
is shown below:
def run(self) -> None:
+ """
+ Run the round.
+ """
+
+ while not self.is_finished():
+
+ self.agent.handle(self.context)
+
+ self.state = self.agent.state.next_state(self.agent)
+ self.agent = self.agent.state.next_agent(self.agent)
+ self.agent.set_state(self.state)
+
+ # If the subtask ends, capture the last snapshot of the application.
+ if self.state.is_subtask_end():
+ time.sleep(configs["SLEEP_TIME"])
+ self.capture_last_snapshot(sub_round_id=self.subtask_amount)
+ self.subtask_amount += 1
+
+ self.agent.blackboard.add_requests(
+ {"request_{i}".format(i=self.id), self.request}
+ )
+
+ if self.application_window is not None:
+ self.capture_last_snapshot()
+
+ if self._should_evaluate:
+ self.evaluation()
+
+At each step, the Round
processes the user's request by invoking the handle
method of the AppAgent
or HostAgent
based on the current state. The state determines the next agent to handle the request and the next state to transition to.
The AppAgent
completes the actions within the application. If the request spans multiple applications, the HostAgent
may switch to a different application to continue the task.
Once the user's request is fulfilled, the Round
is terminated, and the results are returned to the user. If configured, the EvaluationAgent
evaluates the completeness of the Round
.
+ Bases: ABC
A round of a session in UFO. +A round manages a single user request and consists of multiple steps. +A session may consists of multiple rounds of interactions.
+ +Initialize a round.
+ + +Parameters: | +
+
|
+
---|
module/basic.py
48 +49 +50 +51 +52 +53 +54 +55 +56 +57 +58 +59 +60 +61 +62 +63 +64 +65 +66 +67 +68 +69 +70 +71 +72 |
|
agent: BasicAgent
+
+
+ property
+ writable
+
+
+Get the agent of the round. +return: The agent of the round.
+application_window: UIAWrapper
+
+
+ property
+ writable
+
+
+Get the application of the session. +return: The application of the session.
+context: Context
+
+
+ property
+
+
+Get the context of the round. +return: The context of the round.
+cost: float
+
+
+ property
+
+
+Get the cost of the round. +return: The cost of the round.
+id: int
+
+
+ property
+
+
+Get the id of the round. +return: The id of the round.
+log_path: str
+
+
+ property
+
+
+Get the log path of the round.
+return: The log path of the round.
+request: str
+
+
+ property
+
+
+Get the request of the round. +return: The request of the round.
+state: AgentState
+
+
+ property
+ writable
+
+
+Get the status of the round. +return: The status of the round.
+step: int
+
+
+ property
+
+
+Get the local step of the round. +return: The step of the round.
+subtask_amount: int
+
+
+ property
+ writable
+
+
+Get the subtask amount of the round. +return: The subtask amount of the round.
+capture_last_snapshot(sub_round_id=None)
+
+Capture the last snapshot of the application, including the screenshot and the XML file if configured.
+ + +Parameters: | +
+
|
+
---|
module/basic.py
246 +247 +248 +249 +250 +251 +252 +253 +254 +255 +256 +257 +258 +259 +260 +261 +262 +263 +264 +265 +266 +267 +268 +269 +270 +271 +272 +273 +274 +275 +276 +277 +278 +279 +280 +281 +282 +283 +284 +285 +286 +287 +288 +289 +290 +291 +292 +293 +294 +295 +296 +297 +298 +299 +300 +301 +302 +303 +304 +305 +306 +307 +308 +309 +310 |
|
evaluation()
+
+TODO: Evaluate the round.
+ +module/basic.py
312 +313 +314 +315 +316 |
|
is_finished()
+
+Check if the round is finished. +return: True if the round is finished, otherwise False.
+ +module/basic.py
127 +128 +129 +130 +131 +132 +133 +134 +135 |
|
print_cost()
+
+Print the total cost of the round.
+ +module/basic.py
225 +226 +227 +228 +229 +230 +231 +232 +233 +234 +235 |
|
run()
+
+Run the round.
+ +module/basic.py
98 + 99 +100 +101 +102 +103 +104 +105 +106 +107 +108 +109 +110 +111 +112 +113 +114 +115 +116 +117 +118 +119 +120 +121 +122 +123 +124 +125 |
|
A Session
is a conversation instance between the user and UFO. It is a continuous interaction that starts when the user initiates a request and ends when the request is completed. UFO supports multiple requests within the same session. Each request is processed sequentially, by a Round
of interaction, until the user's request is fulfilled. We show the relationship between Session
and Round
in the following figure:
The lifecycle of a Session
is as follows:
A Session
is initialized when the user starts a conversation with UFO. The Session
object is created, and the first Round
of interaction is initiated. At this stage, the user's request is processed by the HostAgent
to determine the appropriate application to fulfill the request. The Context
object is created to store the state of the conversation shared across all Rounds
within the Session
.
Once the Session
is initialized, the Round
of interaction begins, which completes a single user request by orchestrating the HostAgent
and AppAgent
.
After the completion of the first Round
, the Session
requests the next request from the user to start the next Round
of interaction. This process continues until there are no more requests from the user.
+The core logic of a Session
is shown below:
def run(self) -> None:
+ """
+ Run the session.
+ """
+
+ while not self.is_finished():
+
+ round = self.create_new_round()
+ if round is None:
+ break
+ round.run()
+
+ if self.application_window is not None:
+ self.capture_last_snapshot()
+
+ if self._should_evaluate and not self.is_error():
+ self.evaluation()
+
+ self.print_cost()
+
+If the user has no more requests or decides to end the conversation, the Session
is terminated, and the conversation ends. The EvaluationAgent
evaluates the completeness of the Session
if it is configured to do so.
+ Bases: ABC
A basic session in UFO. A session consists of multiple rounds of interactions and conversations.
+ +Initialize a session.
+ + +Parameters: | +
+
|
+
---|
module/basic.py
340 +341 +342 +343 +344 +345 +346 +347 +348 +349 +350 +351 +352 +353 +354 +355 +356 +357 +358 +359 +360 +361 +362 +363 +364 +365 +366 +367 +368 |
|
application_window: UIAWrapper
+
+
+ property
+ writable
+
+
+Get the application of the session. +return: The application of the session.
+context: Context
+
+
+ property
+
+
+Get the context of the session. +return: The context of the session.
+cost: float
+
+
+ property
+ writable
+
+
+Get the cost of the session. +return: The cost of the session.
+current_round: BaseRound
+
+
+ property
+
+
+Get the current round of the session. +return: The current round of the session.
+evaluation_logger: logging.Logger
+
+
+ property
+
+
+Get the logger for evaluation. +return: The logger for evaluation.
+id: int
+
+
+ property
+
+
+Get the id of the session. +return: The id of the session.
+rounds: Dict[int, BaseRound]
+
+
+ property
+
+
+Get the rounds of the session. +return: The rounds of the session.
+session_type: str
+
+
+ property
+
+
+Get the class name of the session. +return: The class name of the session.
+step: int
+
+
+ property
+
+
+Get the step of the session. +return: The step of the session.
+total_rounds: int
+
+
+ property
+
+
+Get the total number of rounds in the session. +return: The total number of rounds in the session.
+add_round(id, round)
+
+Add a round to the session.
+ + +Parameters: | +
+
|
+
---|
module/basic.py
412 +413 +414 +415 +416 +417 +418 |
|
capture_last_snapshot()
+
+Capture the last snapshot of the application, including the screenshot and the XML file if configured.
+ +module/basic.py
660 +661 +662 +663 +664 +665 +666 +667 +668 +669 +670 +671 +672 +673 +674 +675 +676 +677 +678 +679 +680 +681 +682 +683 +684 +685 +686 +687 +688 +689 +690 +691 +692 +693 +694 +695 +696 +697 +698 +699 +700 +701 +702 |
|
create_following_round()
+
+Create a following round. +return: The following round.
+ +module/basic.py
405 +406 +407 +408 +409 +410 |
|
create_new_round()
+
+
+ abstractmethod
+
+
+Create a new round.
+ +module/basic.py
390 +391 +392 +393 +394 +395 |
|
evaluation()
+
+Evaluate the session.
+ +module/basic.py
612 +613 +614 +615 +616 +617 +618 +619 +620 +621 +622 +623 +624 +625 +626 +627 +628 +629 +630 +631 +632 +633 +634 +635 +636 +637 +638 +639 +640 +641 +642 +643 +644 +645 +646 +647 +648 +649 +650 |
|
experience_saver()
+
+Save the current trajectory as agent experience.
+ +module/basic.py
534 +535 +536 +537 +538 +539 +540 +541 +542 +543 +544 +545 +546 +547 +548 +549 +550 +551 +552 +553 +554 +555 +556 +557 +558 +559 +560 +561 |
|
initialize_logger(log_path, log_filename, mode='a', configs=configs)
+
+
+ staticmethod
+
+
+Initialize logging. +log_path: The path of the log file. +log_filename: The name of the log file. +return: The logger.
+ +module/basic.py
704 +705 +706 +707 +708 +709 +710 +711 +712 +713 +714 +715 +716 +717 +718 +719 +720 +721 +722 +723 +724 +725 +726 |
|
is_error()
+
+Check if the session is in error state. +return: True if the session is in error state, otherwise False.
+ +module/basic.py
582 +583 +584 +585 +586 +587 +588 +589 |
|
is_finished()
+
+Check if the session is ended. +return: True if the session is ended, otherwise False.
+ +module/basic.py
591 +592 +593 +594 +595 +596 +597 +598 +599 +600 +601 +602 |
|
next_request()
+
+
+ abstractmethod
+
+
+Get the next request of the session. +return: The request of the session.
+ +module/basic.py
397 +398 +399 +400 +401 +402 +403 |
|
print_cost()
+
+Print the total cost of the session.
+ +module/basic.py
563 +564 +565 +566 +567 +568 +569 +570 +571 +572 +573 +574 +575 +576 +577 +578 +579 +580 |
|
request_to_evaluate()
+
+
+ abstractmethod
+
+
+Get the request to evaluate. +return: The request(s) to evaluate.
+ +module/basic.py
604 +605 +606 +607 +608 +609 +610 |
|
run()
+
+Run the session.
+ +module/basic.py
370 +371 +372 +373 +374 +375 +376 +377 +378 +379 +380 +381 +382 +383 +384 +385 +386 +387 +388 |
|
The UFO project is organized into a well-defined directory structure to facilitate development, deployment, and documentation. Below is an overview of each directory and file, along with their purpose:
+📦project
+ ┣ 📂documents # Folder to store project documentation
+ ┣ 📂learner # Folder to build the vector database for help documents
+ ┣ 📂model_worker # Folder to store tools for deploying your own model
+ ┣ 📂record_processor # Folder to parse human demonstrations from Windows Step Recorder and build the vector database
+ ┣ 📂vetordb # Folder to store all data in the vector database for RAG (Retrieval-Augmented Generation)
+ ┣ 📂logs # Folder to store logs, generated after the program starts
+ ┗ 📂ufo # Directory containing main project code
+ ┣ 📂module # Directory for the basic module of UFO, e.g., session and round
+ ┣ 📂agents # Code implementation of agents in UFO
+ ┣ 📂automator # Implementation of the skill set of agents to automate applications
+ ┣ 📂experience # Parse and save the agent's self-experience
+ ┣ 📂llm # Folder to store the LLM (Large Language Model) implementation
+ ┣ 📂prompter # Prompt constructor for the agent
+ ┣ 📂prompts # Prompt templates and files to construct the full prompt
+ ┣ 📂rag # Implementation of RAG from different sources to enhance agents' abilities
+ ┣ 📂utils # Utility functions
+ ┣ 📂config # Configuration files
+ ┣ 📜config.yaml # User configuration file for LLM and other settings
+ ┣ 📜config_dev.yaml # Configuration file for developers
+ ┗ ...
+ ┗ 📄ufo.py # Main entry point for the UFO client
+
+Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project.
+config.yaml.template
to config.yaml
and edit the configuration settings as needed.The API prompts provide the description and usage of the APIs used in UFO. Shared APIs and app-specific APIs are stored in different directories:
+Directory | +Description | +
---|---|
ufo/prompts/share/base/api.yaml |
+Shared APIs used by multiple applications | +
ufo/prompts/{app_name} |
+APIs specific to an application | +
Info
+You can configure the API prompt used in the config.yaml
file. You can find more information about the configuration file here.
Tip
+You may customize the API prompt for a specific application by adding the API prompt in the application's directory.
+Below is an example of an API prompt:
+click_input:
+ summary: |-
+ "click_input" is to click the control item with mouse.
+ class_name: |-
+ ClickInputCommand
+ usage: |-
+ [1] API call: click_input(button: str, double: bool)
+ [2] Args:
+ - button: 'The mouse button to click. One of ''left'', ''right'', ''middle'' or ''x'' (Default: ''left'')'
+ - double: 'Whether to perform a double click or not (Default: False)'
+ [3] Example: click_input(button="left", double=False)
+ [4] Available control item: All control items.
+ [5] Return: None
+
+To create a new API prompt, follow the template above and add it to the appropriate directory.
+ +The basic prompt template is a fixed format that is used to generate prompts for the HostAgent
, AppAgent
, FollowerAgent
, and EvaluationAgent
. It include the template for the system
and user
roles to construct the agent's prompt.
Below is the default file path for the basic prompt template:
+Agent | +File Path | +Version | +
---|---|---|
HostAgent | +ufo/prompts/share/base/host_agent.yaml | +base | +
HostAgent | +ufo/prompts/share/lite/host_agent.yaml | +lite | +
AppAgent | +ufo/prompts/share/base/app_agent.yaml | +base | +
AppAgent | +ufo/prompts/share/lite/app_agent.yaml | +lite | +
FollowerAgent | +ufo/prompts/share/base/app_agent.yaml | +base | +
FollowerAgent | +ufo/prompts/share/lite/app_agent.yaml | +lite | +
EvaluationAgent | +ufo/prompts/evaluation/evaluation_agent.yaml | +- | +
Info
+You can configure the prompt template used in the config.yaml
file. You can find more information about the configuration file here.
The example prompts are used to generate textual demonstration examples for in-context learning. The examples are stored in the ufo/prompts/examples
directory, with the following subdirectories:
Directory | +Description | +
---|---|
lite |
+Lite version of demonstration examples | +
non-visual |
+Examples for non-visual LLMs | +
visual |
+Examples for visual LLMs | +
Info
+You can configure the example prompt used in the config.yaml
file. You can find more information about the configuration file here.
Below are examples for the HostAgent
and AppAgent
:
Request: |-
+ Summarize and add all to do items on Microsoft To Do from the meeting notes email, and write a summary on the meeting_notes.docx.
+Response:
+ Observation: |-
+ The current screenshot shows the Microsoft To Do application is visible, and outlook application and the meeting_notes.docx are available in the list of applications.
+ Thought: |-
+ The user request can be decomposed into three sub-tasks: (1) Summarize all to do items on Microsoft To Do from the meeting_notes email, (2) Add all to do items to Microsoft To Do, and (3) Write a summary on the meeting_notes.docx. I need to open the Microsoft To Do application to complete the first two sub-tasks.
+ Each sub-task will be completed in individual applications sequentially.
+ CurrentSubtask: |-
+ Summarized all to do items from the meeting notes email in Outlook.
+ Message:
+ - (1) You need to first search for the meeting notes email in Outlook to summarize.
+ - (2) Only summarize the to do items from the meeting notes email, without any redundant information.
+ ControlLabel: |-
+ 16
+ ControlText: |-
+ Mail - Outlook - Jim
+ Status: |-
+ CONTINUE
+ Plan:
+ - Add all to do items previously summarized from the meeting notes email to one-by-one Microsoft To Do.
+ - Write a summary about the meeting notes email on the meeting_notes.docx.
+ Comment: |-
+ I plan to first summarize all to do items from the meeting notes email in Outlook.
+ Questions: []
+
+Request: |-
+ How many stars does the Imdiffusion repo have?
+Sub-task: |-
+ Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
+Response:
+ Observation: |-
+ I observe that the Edge browser is visible in the screenshot, with the Google search page opened.
+ Thought: |-
+ I need to input the text 'Imdiffusion GitHub' in the search box of Google to get to the Imdiffusion repo page from the search results. The search box is usually in a type of ComboBox.
+ ControlLabel: |-
+ 36
+ ControlText: |-
+ 搜索
+ Function: |-
+ set_edit_text
+ Args:
+ {"text": "Imdiffusion GitHub"}
+ Status: |-
+ CONTINUE
+ Plan:
+ - (1) After input 'Imdiffusion GitHub', click Google Search to search for the Imdiffusion repo on github.
+ - (2) Once the searched results are visible, click the Imdiffusion repo Hyperlink in the searched results to open the repo page.
+ - (3) Observing and summarize the number of stars the Imdiffusion repo page, and reply to the user request.
+ Comment: |-
+ I plan to use Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
+ SaveScreenshot:
+ {"save": false, "reason": ""}
+Tips: |-
+ - The search box is usually in a type of ComboBox.
+ - The number of stars of a Github repo page can be found in the repo page visually.
+
+These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.
+ +All prompts used in UFO are stored in the ufo/prompts
directory. The folder structure is as follows:
📦prompts
+ ┣ 📂apps # Stores API prompts for specific applications
+ ┣ 📂excel # Stores API prompts for Excel
+ ┣ 📂word # Stores API prompts for Word
+ ┗ ...
+ ┣ 📂demonstration # Stores prompts for summarizing demonstrations from humans using Step Recorder
+ ┣ 📂experience # Stores prompts for summarizing the agent's self-experience
+ ┣ 📂evaluation # Stores prompts for the EvaluationAgent
+ ┣ 📂examples # Stores demonstration examples for in-context learning
+ ┣ 📂lite # Lite version of demonstration examples
+ ┣ 📂non-visual # Examples for non-visual LLMs
+ ┗ 📂visual # Examples for visual LLMs
+ ┗ 📂share # Stores shared prompts
+ ┣ 📂lite # Lite version of shared prompts
+ ┗ 📂base # Basic version of shared prompts
+ ┣ 📜api.yaml # Basic API prompt
+ ┣ 📜app_agent.yaml # Basic AppAgent prompt template
+ ┗ 📜host_agent.yaml # Basic HostAgent prompt template
+
+Note
+The lite
version of prompts is a simplified version of the full prompts, which is used for LLMs that have a limited token budget. However, the lite
version is not fully optimized and may lead to suboptimal performance.
Note
+The non-visual
and visual
folders contain examples for non-visual and visual LLMs, respectively.
Prompts used an agent usually contain the following information:
+Prompt | +Description | +
---|---|
Basic template |
+A basic template for the agent prompt. | +
API |
+A prompt for all skills and APIs used by the agent. | +
Examples |
+Demonstration examples for the agent for in-context learning. | +
You can find these prompts share
directory. The prompts for specific applications are stored in the apps
directory.
Tip
+All information is constructed using the agent's Prompter
class. You can find more details about the Prompter
class in the documentation here.
' + escapeHtml(summary) +'
' + noResultsText + '
'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..8c3f3e12 --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to UFO's Document! \u2002 \u2002 \u2002 \u2002 \u2002 Introduction UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications. \ud83d\udd4c Framework UFO operates as a multi-agent framework, encompassing: HostAgent \ud83e\udd16 , tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application. AppAgent \ud83d\udc7e , responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. Application Automator \ud83c\udfae , is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here . Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report . \ud83d\ude80 Quick Start Please follow the Quick Start Guide to get started with UFO. \ud83d\udca5 Highlights First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS. Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application \"expert\". Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and \"Copilot\". Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks. Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior. Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way. \ud83c\udf10 Media Coverage Check out our official deep dive of UFO on this Youtube Video . UFO sightings have garnered attention from various media outlets, including: Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience \ud83d\ude80 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo\ud83c\udf0c The AI PC - The Future of Computers? - Microsoft UFO \u4e0b\u4e00\u4ee3Windows\u7cfb\u7edf\u66dd\u5149\uff1a\u57fa\u4e8eGPT-4V\uff0cAgent\u8de8\u5e94\u7528\u8c03\u5ea6\uff0c\u4ee3\u53f7UFO \u4e0b\u4e00\u4ee3\u667a\u80fd\u7248 Windows \u8981\u6765\u4e86\uff1f\u5fae\u8f6f\u63a8\u51fa\u9996\u4e2a Windows Agent\uff0c\u547d\u540d\u4e3a UFO\uff01 Microsoft\u767a\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u7248\u300cUFO\u300d\u767b\u5834\uff01\u3000Windows\u3092\u81ea\u52d5\u64cd\u7e26\u3059\u308bAI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u8a66\u3059 \u2753Get help \u2754GitHub Issues (prefered) For other communications, please contact ufo-agent@microsoft.com \ud83d\udcda Citation Our technical report paper can be found here . Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. If you use UFO in your research, please cite our paper: @article{ufo, title={{UFO: A UI-Focused Agent for Windows OS Interaction}}, author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi}, journal={arXiv preprint arXiv:2402.07939}, year={2024} } \ud83c\udfa8 Related Projects If you're interested in data analytics agent frameworks, check out TaskWeaver , a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks. For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey . You can also explore the survey through: - GitHub Repository - Searchable Website window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-FX17ZGJYGC');","title":"Home"},{"location":"#welcome-to-ufos-document","text":"","title":"Welcome to UFO's Document!"},{"location":"#introduction","text":"UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.","title":"Introduction"},{"location":"#framework","text":"UFO operates as a multi-agent framework, encompassing: HostAgent \ud83e\udd16 , tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application. AppAgent \ud83d\udc7e , responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. Application Automator \ud83c\udfae , is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here . Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report .","title":"\ud83d\udd4c Framework"},{"location":"#quick-start","text":"Please follow the Quick Start Guide to get started with UFO.","title":"\ud83d\ude80 Quick Start"},{"location":"#highlights","text":"First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS. Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application \"expert\". Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and \"Copilot\". Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks. Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior. Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way.","title":"\ud83d\udca5 Highlights"},{"location":"#media-coverage","text":"Check out our official deep dive of UFO on this Youtube Video . UFO sightings have garnered attention from various media outlets, including: Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience \ud83d\ude80 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo\ud83c\udf0c The AI PC - The Future of Computers? - Microsoft UFO \u4e0b\u4e00\u4ee3Windows\u7cfb\u7edf\u66dd\u5149\uff1a\u57fa\u4e8eGPT-4V\uff0cAgent\u8de8\u5e94\u7528\u8c03\u5ea6\uff0c\u4ee3\u53f7UFO \u4e0b\u4e00\u4ee3\u667a\u80fd\u7248 Windows \u8981\u6765\u4e86\uff1f\u5fae\u8f6f\u63a8\u51fa\u9996\u4e2a Windows Agent\uff0c\u547d\u540d\u4e3a UFO\uff01 Microsoft\u767a\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u7248\u300cUFO\u300d\u767b\u5834\uff01\u3000Windows\u3092\u81ea\u52d5\u64cd\u7e26\u3059\u308bAI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u8a66\u3059","title":"\ud83c\udf10 Media Coverage"},{"location":"#get-help","text":"\u2754GitHub Issues (prefered) For other communications, please contact ufo-agent@microsoft.com","title":"\u2753Get help"},{"location":"#citation","text":"Our technical report paper can be found here . Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. If you use UFO in your research, please cite our paper: @article{ufo, title={{UFO: A UI-Focused Agent for Windows OS Interaction}}, author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi}, journal={arXiv preprint arXiv:2402.07939}, year={2024} }","title":"\ud83d\udcda Citation"},{"location":"#related-projects","text":"If you're interested in data analytics agent frameworks, check out TaskWeaver , a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks. For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey . You can also explore the survey through: - GitHub Repository - Searchable Website window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-FX17ZGJYGC');","title":"\ud83c\udfa8 Related Projects"},{"location":"faq/","text":"FAQ We provide answers to some frequently asked questions about the UFO. Q1: Why is it called UFO? A: UFO stands for U I Fo cused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic. Q2: Can I use UFO on Linux or macOS? A: UFO is currently only supported on Windows OS. Q3: Why the latency of UFO is high? A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency. Q4: What models does UFO support? A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models section of the documentation. Q5: Can I use non-vision models in UFO? A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE to False in the config.yaml file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance. Q6: Can I host my own LLM endpoint? A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models section for more details. Q7: Can I use non-English requests in UFO? A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages. Q8: Why it shows the error Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) ? A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint. Info To get more support, please submit an issue on the GitHub Issues , or send an email to ufo-agent@microsoft.com .","title":"FAQ"},{"location":"faq/#faq","text":"We provide answers to some frequently asked questions about the UFO.","title":"FAQ"},{"location":"faq/#q1-why-is-it-called-ufo","text":"A: UFO stands for U I Fo cused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic.","title":"Q1: Why is it called UFO?"},{"location":"faq/#q2-can-i-use-ufo-on-linux-or-macos","text":"A: UFO is currently only supported on Windows OS.","title":"Q2: Can I use UFO on Linux or macOS?"},{"location":"faq/#q3-why-the-latency-of-ufo-is-high","text":"A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency.","title":"Q3: Why the latency of UFO is high?"},{"location":"faq/#q4-what-models-does-ufo-support","text":"A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models section of the documentation.","title":"Q4: What models does UFO support?"},{"location":"faq/#q5-can-i-use-non-vision-models-in-ufo","text":"A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE to False in the config.yaml file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance.","title":"Q5: Can I use non-vision models in UFO?"},{"location":"faq/#q6-can-i-host-my-own-llm-endpoint","text":"A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models section for more details.","title":"Q6: Can I host my own LLM endpoint?"},{"location":"faq/#q7-can-i-use-non-english-requests-in-ufo","text":"A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages.","title":"Q7: Can I use non-English requests in UFO?"},{"location":"faq/#q8-why-it-shows-the-error-error-making-api-request-connection-aborted-remotedisconnectedremote-end-closed-connection-without-response","text":"A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint. Info To get more support, please submit an issue on the GitHub Issues , or send an email to ufo-agent@microsoft.com .","title":"Q8: Why it shows the error Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))?"},{"location":"project_directory_structure/","text":"The UFO project is organized into a well-defined directory structure to facilitate development, deployment, and documentation. Below is an overview of each directory and file, along with their purpose: \ud83d\udce6project \u2523 \ud83d\udcc2documents # Folder to store project documentation \u2523 \ud83d\udcc2learner # Folder to build the vector database for help documents \u2523 \ud83d\udcc2model_worker # Folder to store tools for deploying your own model \u2523 \ud83d\udcc2record_processor # Folder to parse human demonstrations from Windows Step Recorder and build the vector database \u2523 \ud83d\udcc2vetordb # Folder to store all data in the vector database for RAG (Retrieval-Augmented Generation) \u2523 \ud83d\udcc2logs # Folder to store logs, generated after the program starts \u2517 \ud83d\udcc2ufo # Directory containing main project code \u2523 \ud83d\udcc2module # Directory for the basic module of UFO, e.g., session and round \u2523 \ud83d\udcc2agents # Code implementation of agents in UFO \u2523 \ud83d\udcc2automator # Implementation of the skill set of agents to automate applications \u2523 \ud83d\udcc2experience # Parse and save the agent's self-experience \u2523 \ud83d\udcc2llm # Folder to store the LLM (Large Language Model) implementation \u2523 \ud83d\udcc2prompter # Prompt constructor for the agent \u2523 \ud83d\udcc2prompts # Prompt templates and files to construct the full prompt \u2523 \ud83d\udcc2rag # Implementation of RAG from different sources to enhance agents' abilities \u2523 \ud83d\udcc2utils # Utility functions \u2523 \ud83d\udcc2config # Configuration files \u2523 \ud83d\udcdcconfig.yaml # User configuration file for LLM and other settings \u2523 \ud83d\udcdcconfig_dev.yaml # Configuration file for developers \u2517 ... \u2517 \ud83d\udcc4ufo.py # Main entry point for the UFO client Directory and File Descriptions documents Purpose: Stores all the project documentation. Details: This may include design documents, user manuals, API documentation, and any other relevant project documentation. learner Purpose: Used to build the vector database for help documents. Details: This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion. model_worker Purpose: Contains tools and scripts necessary for deploying custom models. Details: This includes model deployment configurations, and management tools for integrating custom models into the project. record_processor Purpose: Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database. Details: This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval. vetordb Purpose: Stores all data within the vector database for Retrieval-Augmented Generation (RAG). Details: This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses. logs Purpose: Stores log files generated by the application. Details: This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs. ufo Purpose: The core directory containing the main project code. Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project. module Purpose: Contains the basic modules of the UFO project, such as session management and rounds. Details: This includes foundational classes and functions that are used throughout the project. agents Purpose: Houses the code implementations of various agents in the UFO project. Details: Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior. automator Purpose: Implements the skill set of agents to automate applications. Details: This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls. experience Purpose: Parses and saves the agent's self-experience. Details: This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time. llm Purpose: Stores the implementation of the Large Language Model (LLM). Details: This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents. prompter Purpose: Constructs prompts for the agents. Details: This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions. prompts Purpose: Contains prompt templates and files used to construct the full prompt. Details: This includes predefined prompt structures and content that are used to create meaningful interactions with the agents. rag Purpose: Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities. etails: This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs. utils Purpose: Contains utility functions. Details: This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations. config Purpose: Stores configuration files. Details: This directory includes different configuration files for various environments and purposes. config.yaml: User configuration file for LLM and other settings. You need to rename config.yaml.template to config.yaml and edit the configuration settings as needed. config_dev.yaml : Developer-specific configuration file with settings tailored for development purposes. ufo.py Purpose: Main entry point for the UFO client. Details: This script initializes and starts the UFO application.","title":"Project Directory Structure"},{"location":"project_directory_structure/#directory-and-file-descriptions","text":"","title":"Directory and File Descriptions"},{"location":"project_directory_structure/#documents","text":"Purpose: Stores all the project documentation. Details: This may include design documents, user manuals, API documentation, and any other relevant project documentation.","title":"documents"},{"location":"project_directory_structure/#learner","text":"Purpose: Used to build the vector database for help documents. Details: This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion.","title":"learner"},{"location":"project_directory_structure/#model_worker","text":"Purpose: Contains tools and scripts necessary for deploying custom models. Details: This includes model deployment configurations, and management tools for integrating custom models into the project.","title":"model_worker"},{"location":"project_directory_structure/#record_processor","text":"Purpose: Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database. Details: This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval.","title":"record_processor"},{"location":"project_directory_structure/#vetordb","text":"Purpose: Stores all data within the vector database for Retrieval-Augmented Generation (RAG). Details: This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses.","title":"vetordb"},{"location":"project_directory_structure/#logs","text":"Purpose: Stores log files generated by the application. Details: This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs.","title":"logs"},{"location":"project_directory_structure/#ufo","text":"Purpose: The core directory containing the main project code. Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project.","title":"ufo"},{"location":"project_directory_structure/#module","text":"Purpose: Contains the basic modules of the UFO project, such as session management and rounds. Details: This includes foundational classes and functions that are used throughout the project.","title":"module"},{"location":"project_directory_structure/#agents","text":"Purpose: Houses the code implementations of various agents in the UFO project. Details: Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior.","title":"agents"},{"location":"project_directory_structure/#automator","text":"Purpose: Implements the skill set of agents to automate applications. Details: This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls.","title":"automator"},{"location":"project_directory_structure/#experience","text":"Purpose: Parses and saves the agent's self-experience. Details: This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time.","title":"experience"},{"location":"project_directory_structure/#llm","text":"Purpose: Stores the implementation of the Large Language Model (LLM). Details: This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents.","title":"llm"},{"location":"project_directory_structure/#prompter","text":"Purpose: Constructs prompts for the agents. Details: This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions.","title":"prompter"},{"location":"project_directory_structure/#prompts","text":"Purpose: Contains prompt templates and files used to construct the full prompt. Details: This includes predefined prompt structures and content that are used to create meaningful interactions with the agents.","title":"prompts"},{"location":"project_directory_structure/#rag","text":"Purpose: Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities. etails: This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs.","title":"rag"},{"location":"project_directory_structure/#utils","text":"Purpose: Contains utility functions. Details: This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations.","title":"utils"},{"location":"project_directory_structure/#config","text":"Purpose: Stores configuration files. Details: This directory includes different configuration files for various environments and purposes. config.yaml: User configuration file for LLM and other settings. You need to rename config.yaml.template to config.yaml and edit the configuration settings as needed. config_dev.yaml : Developer-specific configuration file with settings tailored for development purposes.","title":"config"},{"location":"project_directory_structure/#ufopy","text":"Purpose: Main entry point for the UFO client. Details: This script initializes and starts the UFO application.","title":"ufo.py"},{"location":"about/CODE_OF_CONDUCT/","text":"Microsoft Open Source Code of Conduct This project has adopted the Microsoft Open Source Code of Conduct . Resources: Microsoft Open Source Code of Conduct Microsoft Code of Conduct FAQ Contact opencode@microsoft.com with questions or concerns","title":"Code of Conduct"},{"location":"about/CODE_OF_CONDUCT/#microsoft-open-source-code-of-conduct","text":"This project has adopted the Microsoft Open Source Code of Conduct . Resources: Microsoft Open Source Code of Conduct Microsoft Code of Conduct FAQ Contact opencode@microsoft.com with questions or concerns","title":"Microsoft Open Source Code of Conduct"},{"location":"about/CONTRIBUTING/","text":"Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. Note You should sunmit your pull request to the pre-release branch, not the main branch. This project has adopted the Microsoft Open Source Code of Conduct . For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.","title":"Contributing"},{"location":"about/CONTRIBUTING/#contributing","text":"This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. Note You should sunmit your pull request to the pre-release branch, not the main branch. This project has adopted the Microsoft Open Source Code of Conduct . For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.","title":"Contributing"},{"location":"about/DISCLAIMER/","text":"Disclaimer: Code Execution and Data Handling Notice By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices: 1. Code Functionality: The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference. 2. Data Privacy and Storage: It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft. 3. User Responsibility: By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process. 4. Security Measures: Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system. 5. Consent for Inference: You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code. 6. No Guarantee of Accuracy: The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model. 7. Indemnification: Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo. 8. Reporting Infringements: If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary. 9. Modifications to the Disclaimer: Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes. By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.","title":"Disclaimer"},{"location":"about/DISCLAIMER/#disclaimer-code-execution-and-data-handling-notice","text":"By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:","title":"Disclaimer: Code Execution and Data Handling Notice"},{"location":"about/DISCLAIMER/#1-code-functionality","text":"The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.","title":"1. Code Functionality:"},{"location":"about/DISCLAIMER/#2-data-privacy-and-storage","text":"It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.","title":"2. Data Privacy and Storage:"},{"location":"about/DISCLAIMER/#3-user-responsibility","text":"By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.","title":"3. User Responsibility:"},{"location":"about/DISCLAIMER/#4-security-measures","text":"Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.","title":"4. Security Measures:"},{"location":"about/DISCLAIMER/#5-consent-for-inference","text":"You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.","title":"5. Consent for Inference:"},{"location":"about/DISCLAIMER/#6-no-guarantee-of-accuracy","text":"The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.","title":"6. No Guarantee of Accuracy:"},{"location":"about/DISCLAIMER/#7-indemnification","text":"Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.","title":"7. Indemnification:"},{"location":"about/DISCLAIMER/#8-reporting-infringements","text":"If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.","title":"8. Reporting Infringements:"},{"location":"about/DISCLAIMER/#9-modifications-to-the-disclaimer","text":"Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes. By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.","title":"9. Modifications to the Disclaimer:"},{"location":"about/LICENSE/","text":"Copyright (c) Microsoft Corporation. MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"License"},{"location":"about/LICENSE/#mit-license","text":"Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"about/SUPPORT/","text":"Support How to file issues and get help This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue. You may use GitHub Issues to raise questions, bug reports, and feature requests. For help and questions about using this project, please please contact ufo-agent@microsoft.com . Microsoft Support Policy Support for this PROJECT or PRODUCT is limited to the resources listed above.","title":"Support"},{"location":"about/SUPPORT/#support","text":"","title":"Support"},{"location":"about/SUPPORT/#how-to-file-issues-and-get-help","text":"This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue. You may use GitHub Issues to raise questions, bug reports, and feature requests. For help and questions about using this project, please please contact ufo-agent@microsoft.com .","title":"How to file issues and get help"},{"location":"about/SUPPORT/#microsoft-support-policy","text":"Support for this PROJECT or PRODUCT is limited to the resources listed above.","title":"Microsoft Support Policy"},{"location":"advanced_usage/customization/","text":"Customization Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user. Scenario Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again. Implementation We currently implement the customization feature in the HostAgent class. When the HostAgent needs additional information, it will transit to the PENDING state and ask the user for the information. The user will provide the information, and the HostAgent will save it in the local memory base for future reference. The saved information is stored in the blackboard and can be accessed by all agents in the session. Note The customization memory base is only saved in a local file . These information will not upload to the cloud or any other storage to protect the user's privacy. Configuration You can configure the customization feature by setting the following field in the config_dev.yaml file. Configuration Option Description Type Default Value USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Customization"},{"location":"advanced_usage/customization/#customization","text":"Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.","title":"Customization"},{"location":"advanced_usage/customization/#scenario","text":"Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.","title":"Scenario"},{"location":"advanced_usage/customization/#implementation","text":"We currently implement the customization feature in the HostAgent class. When the HostAgent needs additional information, it will transit to the PENDING state and ask the user for the information. The user will provide the information, and the HostAgent will save it in the local memory base for future reference. The saved information is stored in the blackboard and can be accessed by all agents in the session. Note The customization memory base is only saved in a local file . These information will not upload to the cloud or any other storage to protect the user's privacy.","title":"Implementation"},{"location":"advanced_usage/customization/#configuration","text":"You can configure the customization feature by setting the following field in the config_dev.yaml file. Configuration Option Description Type Default Value USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Configuration"},{"location":"advanced_usage/follower_mode/","text":"Follower Mode The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification. Quick Start Step 1: Create a Plan file Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields: Field Description Type task The task description. String steps The list of steps for the agent to follow. List of Strings object The application or file to interact with. String Below is an example of a plan file: { \"task\": \"Type in a text of 'Test For Fun' with heading 1 level\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.Select the 'Test For Fun' text\", \"3.Click 'Home' tab to show the 'Styles' ribbon tab\", \"4.Click 'Styles' ribbon tab to show the style 'Heading 1'\", \"5.Click 'Heading 1' style to apply the style to the selected text\" ], \"object\": \"draft.docx\" } Note The object field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode. Step 2: Start the Follower Mode To start the Follower mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_file} Tip Replace {task_name} with the name of the task and {plan_file} with the path to the plan file. Step 3: Run in Batch (Optional) You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_folder} UFO will automatically detect the plan files in the folder and run them one by one. Tip Replace {task_name} with the name of the task and {plan_folder} with the path to the folder containing plan files. Evaluation You may want to evaluate the task is completed successfully or not by following the plan. UFO will call the EvaluationAgent to evaluate the task if EVA_SESSION is set to True in the config_dev.yaml file. You can check the evaluation log in the logs/{task_name}/evaluation.log file. References The follower mode employs a PlanReader to parse the plan file and create a FollowerSession to follow the plan. PlanReader The PlanReader is located in the ufo/module/sessions/plan_reader.py file. The reader for a plan file. Initialize a plan reader. Parameters: plan_file ( str ) \u2013 The path of the plan file. Source code in module/sessions/plan_reader.py 17 18 19 20 21 22 23 24 25 def __init__ ( self , plan_file : str ): \"\"\" Initialize a plan reader. :param plan_file: The path of the plan file. \"\"\" with open ( plan_file , \"r\" ) as f : self . plan = json . load ( f ) self . remaining_steps = self . get_steps () get_host_agent_request () Get the request for the host agent. Returns: str \u2013 The request for the host agent. Source code in module/sessions/plan_reader.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def get_host_agent_request ( self ) -> str : \"\"\" Get the request for the host agent. :return: The request for the host agent. \"\"\" object_name = self . get_operation_object () request = ( f \"Open and select the application of { object_name } , and output the FINISH status immediately. \" \"You must output the selected application with their control text and label even if it is already open.\" ) return request get_initial_request () Get the initial request in the plan. Returns: str \u2013 The initial request. Source code in module/sessions/plan_reader.py 51 52 53 54 55 56 57 58 59 60 61 62 def get_initial_request ( self ) -> str : \"\"\" Get the initial request in the plan. :return: The initial request. \"\"\" task = self . get_task () object_name = self . get_operation_object () request = f \" { task } in { object_name } \" return request get_operation_object () Get the operation object in the step. Returns: str \u2013 The operation object. Source code in module/sessions/plan_reader.py 43 44 45 46 47 48 49 def get_operation_object ( self ) -> str : \"\"\" Get the operation object in the step. :return: The operation object. \"\"\" return self . plan . get ( \"object\" , \"\" ) get_steps () Get the steps in the plan. Returns: List [ str ] \u2013 The steps in the plan. Source code in module/sessions/plan_reader.py 35 36 37 38 39 40 41 def get_steps ( self ) -> List [ str ]: \"\"\" Get the steps in the plan. :return: The steps in the plan. \"\"\" return self . plan . get ( \"steps\" , []) get_task () Get the task name. Returns: str \u2013 The task name. Source code in module/sessions/plan_reader.py 27 28 29 30 31 32 33 def get_task ( self ) -> str : \"\"\" Get the task name. :return: The task name. \"\"\" return self . plan . get ( \"task\" , \"\" ) next_step () Get the next step in the plan. Returns: Optional [ str ] \u2013 The next step. Source code in module/sessions/plan_reader.py 79 80 81 82 83 84 85 86 87 88 89 def next_step ( self ) -> Optional [ str ]: \"\"\" Get the next step in the plan. :return: The next step. \"\"\" if self . remaining_steps : step = self . remaining_steps . pop ( 0 ) return step return None task_finished () Check if the task is finished. Returns: bool \u2013 True if the task is finished, False otherwise. Source code in module/sessions/plan_reader.py 91 92 93 94 95 96 97 def task_finished ( self ) -> bool : \"\"\" Check if the task is finished. :return: True if the task is finished, False otherwise. \"\"\" return not self . remaining_steps FollowerSession The FollowerSession is also located in the ufo/module/sessions/session.py file. Bases: BaseSession A session for following a list of plan for action taken. This session is used for the follower agent, which accepts a plan file to follow using the PlanReader. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. plan_file ( str ) \u2013 The path of the plan file to follow. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/sessions/session.py 197 198 199 200 201 202 203 204 205 206 207 208 209 210 def __init__ ( self , task : str , plan_file : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param plan_file: The path of the plan file to follow. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" super () . __init__ ( task , should_evaluate , id ) self . plan_reader = PlanReader ( plan_file ) create_new_round () Create a new round. Source code in module/sessions/session.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 def create_new_round ( self ) -> None : \"\"\" Create a new round. \"\"\" # Get a request for the new round. request = self . next_request () # Create a new round and return None if the session is finished. if self . is_finished (): return None if self . total_rounds == 0 : utils . print_with_color ( \"Complete the following request:\" , \"yellow\" ) utils . print_with_color ( self . plan_reader . get_initial_request (), \"cyan\" ) agent = self . _host_agent else : agent = self . _host_agent . get_active_appagent () # Clear the memory and set the state to continue the app agent. agent . clear_memory () agent . blackboard . requests . clear () agent . set_state ( ContinueAppAgentState ()) round = BaseRound ( request = request , agent = agent , context = self . context , should_evaluate = configs . get ( \"EVA_ROUND\" , False ), id = self . total_rounds , ) self . add_round ( round . id , round ) return round next_request () Get the request for the new round. Source code in module/sessions/session.py 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 def next_request ( self ) -> str : \"\"\" Get the request for the new round. \"\"\" # If the task is finished, return an empty string. if self . plan_reader . task_finished (): self . _finish = True return \"\" # Get the request from the plan reader. if self . total_rounds == 0 : return self . plan_reader . get_host_agent_request () else : return self . plan_reader . next_step () request_to_evaluate () Check if the session should be evaluated. Returns: bool \u2013 True if the session should be evaluated, False otherwise. Source code in module/sessions/session.py 273 274 275 276 277 278 279 def request_to_evaluate ( self ) -> bool : \"\"\" Check if the session should be evaluated. :return: True if the session should be evaluated, False otherwise. \"\"\" return self . plan_reader . get_task ()","title":"Follower Mode"},{"location":"advanced_usage/follower_mode/#follower-mode","text":"The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.","title":"Follower Mode"},{"location":"advanced_usage/follower_mode/#quick-start","text":"","title":"Quick Start"},{"location":"advanced_usage/follower_mode/#step-1-create-a-plan-file","text":"Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields: Field Description Type task The task description. String steps The list of steps for the agent to follow. List of Strings object The application or file to interact with. String Below is an example of a plan file: { \"task\": \"Type in a text of 'Test For Fun' with heading 1 level\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.Select the 'Test For Fun' text\", \"3.Click 'Home' tab to show the 'Styles' ribbon tab\", \"4.Click 'Styles' ribbon tab to show the style 'Heading 1'\", \"5.Click 'Heading 1' style to apply the style to the selected text\" ], \"object\": \"draft.docx\" } Note The object field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode.","title":"Step 1: Create a Plan file"},{"location":"advanced_usage/follower_mode/#step-2-start-the-follower-mode","text":"To start the Follower mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_file} Tip Replace {task_name} with the name of the task and {plan_file} with the path to the plan file.","title":"Step 2: Start the Follower Mode"},{"location":"advanced_usage/follower_mode/#step-3-run-in-batch-optional","text":"You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_folder} UFO will automatically detect the plan files in the folder and run them one by one. Tip Replace {task_name} with the name of the task and {plan_folder} with the path to the folder containing plan files.","title":"Step 3: Run in Batch (Optional)"},{"location":"advanced_usage/follower_mode/#evaluation","text":"You may want to evaluate the task is completed successfully or not by following the plan. UFO will call the EvaluationAgent to evaluate the task if EVA_SESSION is set to True in the config_dev.yaml file. You can check the evaluation log in the logs/{task_name}/evaluation.log file.","title":"Evaluation"},{"location":"advanced_usage/follower_mode/#references","text":"The follower mode employs a PlanReader to parse the plan file and create a FollowerSession to follow the plan.","title":"References"},{"location":"advanced_usage/follower_mode/#planreader","text":"The PlanReader is located in the ufo/module/sessions/plan_reader.py file. The reader for a plan file. Initialize a plan reader. Parameters: plan_file ( str ) \u2013 The path of the plan file. Source code in module/sessions/plan_reader.py 17 18 19 20 21 22 23 24 25 def __init__ ( self , plan_file : str ): \"\"\" Initialize a plan reader. :param plan_file: The path of the plan file. \"\"\" with open ( plan_file , \"r\" ) as f : self . plan = json . load ( f ) self . remaining_steps = self . get_steps ()","title":"PlanReader"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_host_agent_request","text":"Get the request for the host agent. Returns: str \u2013 The request for the host agent. Source code in module/sessions/plan_reader.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def get_host_agent_request ( self ) -> str : \"\"\" Get the request for the host agent. :return: The request for the host agent. \"\"\" object_name = self . get_operation_object () request = ( f \"Open and select the application of { object_name } , and output the FINISH status immediately. \" \"You must output the selected application with their control text and label even if it is already open.\" ) return request","title":"get_host_agent_request"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_initial_request","text":"Get the initial request in the plan. Returns: str \u2013 The initial request. Source code in module/sessions/plan_reader.py 51 52 53 54 55 56 57 58 59 60 61 62 def get_initial_request ( self ) -> str : \"\"\" Get the initial request in the plan. :return: The initial request. \"\"\" task = self . get_task () object_name = self . get_operation_object () request = f \" { task } in { object_name } \" return request","title":"get_initial_request"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_operation_object","text":"Get the operation object in the step. Returns: str \u2013 The operation object. Source code in module/sessions/plan_reader.py 43 44 45 46 47 48 49 def get_operation_object ( self ) -> str : \"\"\" Get the operation object in the step. :return: The operation object. \"\"\" return self . plan . get ( \"object\" , \"\" )","title":"get_operation_object"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_steps","text":"Get the steps in the plan. Returns: List [ str ] \u2013 The steps in the plan. Source code in module/sessions/plan_reader.py 35 36 37 38 39 40 41 def get_steps ( self ) -> List [ str ]: \"\"\" Get the steps in the plan. :return: The steps in the plan. \"\"\" return self . plan . get ( \"steps\" , [])","title":"get_steps"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_task","text":"Get the task name. Returns: str \u2013 The task name. Source code in module/sessions/plan_reader.py 27 28 29 30 31 32 33 def get_task ( self ) -> str : \"\"\" Get the task name. :return: The task name. \"\"\" return self . plan . get ( \"task\" , \"\" )","title":"get_task"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.next_step","text":"Get the next step in the plan. Returns: Optional [ str ] \u2013 The next step. Source code in module/sessions/plan_reader.py 79 80 81 82 83 84 85 86 87 88 89 def next_step ( self ) -> Optional [ str ]: \"\"\" Get the next step in the plan. :return: The next step. \"\"\" if self . remaining_steps : step = self . remaining_steps . pop ( 0 ) return step return None","title":"next_step"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.task_finished","text":"Check if the task is finished. Returns: bool \u2013 True if the task is finished, False otherwise. Source code in module/sessions/plan_reader.py 91 92 93 94 95 96 97 def task_finished ( self ) -> bool : \"\"\" Check if the task is finished. :return: True if the task is finished, False otherwise. \"\"\" return not self . remaining_steps","title":"task_finished"},{"location":"advanced_usage/follower_mode/#followersession","text":"The FollowerSession is also located in the ufo/module/sessions/session.py file. Bases: BaseSession A session for following a list of plan for action taken. This session is used for the follower agent, which accepts a plan file to follow using the PlanReader. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. plan_file ( str ) \u2013 The path of the plan file to follow. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/sessions/session.py 197 198 199 200 201 202 203 204 205 206 207 208 209 210 def __init__ ( self , task : str , plan_file : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param plan_file: The path of the plan file to follow. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" super () . __init__ ( task , should_evaluate , id ) self . plan_reader = PlanReader ( plan_file )","title":"FollowerSession"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.create_new_round","text":"Create a new round. Source code in module/sessions/session.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 def create_new_round ( self ) -> None : \"\"\" Create a new round. \"\"\" # Get a request for the new round. request = self . next_request () # Create a new round and return None if the session is finished. if self . is_finished (): return None if self . total_rounds == 0 : utils . print_with_color ( \"Complete the following request:\" , \"yellow\" ) utils . print_with_color ( self . plan_reader . get_initial_request (), \"cyan\" ) agent = self . _host_agent else : agent = self . _host_agent . get_active_appagent () # Clear the memory and set the state to continue the app agent. agent . clear_memory () agent . blackboard . requests . clear () agent . set_state ( ContinueAppAgentState ()) round = BaseRound ( request = request , agent = agent , context = self . context , should_evaluate = configs . get ( \"EVA_ROUND\" , False ), id = self . total_rounds , ) self . add_round ( round . id , round ) return round","title":"create_new_round"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.next_request","text":"Get the request for the new round. Source code in module/sessions/session.py 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 def next_request ( self ) -> str : \"\"\" Get the request for the new round. \"\"\" # If the task is finished, return an empty string. if self . plan_reader . task_finished (): self . _finish = True return \"\" # Get the request from the plan reader. if self . total_rounds == 0 : return self . plan_reader . get_host_agent_request () else : return self . plan_reader . next_step ()","title":"next_request"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.request_to_evaluate","text":"Check if the session should be evaluated. Returns: bool \u2013 True if the session should be evaluated, False otherwise. Source code in module/sessions/session.py 273 274 275 276 277 278 279 def request_to_evaluate ( self ) -> bool : \"\"\" Check if the session should be evaluated. :return: True if the session should be evaluated, False otherwise. \"\"\" return self . plan_reader . get_task ()","title":"request_to_evaluate"},{"location":"advanced_usage/control_filtering/icon_filtering/","text":"Icon Filter The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings. Configuration To activate the icon control filtering, you need to add ICON to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed icon control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON to the list. CONTROL_FILTER_TOP_K_ICON : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_ICON_NAME : The control filter model name for icon similarity. By default, it is set to \"clip-ViT-B-32\". Reference Bases: BasicControlFilter A class that represents a icon model for control filtering. control_filter ( control_dicts , cropped_icons_dict , plans , top_k ) Filters control items based on their scores and returns the top-k items. Parameters: control_dicts \u2013 The dictionary of all control items. cropped_icons_dict \u2013 The dictionary of the cropped icons. plans \u2013 The plans to compare the control icons against. top_k \u2013 The number of top items to return. Returns: \u2013 The list of top-k control items based on their scores. Source code in automator/ui_control/control_filter.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 def control_filter ( self , control_dicts , cropped_icons_dict , plans , top_k ): \"\"\" Filters control items based on their scores and returns the top-k items. :param control_dicts: The dictionary of all control items. :param cropped_icons_dict: The dictionary of the cropped icons. :param plans: The plans to compare the control icons against. :param top_k: The number of top items to return. :return: The list of top-k control items based on their scores. \"\"\" scores_items = [] filtered_control_dict = {} for label , cropped_icon in cropped_icons_dict . items (): score = self . control_filter_score ( cropped_icon , plans ) scores_items . append (( score , label )) topk_scores_items = heapq . nlargest ( top_k , scores_items , key = lambda x : x [ 0 ]) topk_labels = [ scores_items [ 1 ] for scores_items in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_labels : filtered_control_dict [ label ] = control_item return filtered_control_dict control_filter_score ( control_icon , plans ) Calculates the score of a control icon based on its similarity to the given keywords. Parameters: control_icon \u2013 The control icon image. plans \u2013 The plan to compare the control icon against. Returns: \u2013 The maximum similarity score between the control icon and the keywords. Source code in automator/ui_control/control_filter.py 240 241 242 243 244 245 246 247 248 249 250 def control_filter_score ( self , control_icon , plans ): \"\"\" Calculates the score of a control icon based on its similarity to the given keywords. :param control_icon: The control icon image. :param plans: The plan to compare the control icon against. :return: The maximum similarity score between the control icon and the keywords. \"\"\" plans_embedding = self . get_embedding ( plans ) control_icon_embedding = self . get_embedding ( control_icon ) return max ( self . cos_sim ( control_icon_embedding , plans_embedding ) . tolist ()[ 0 ])","title":"Icon Filtering"},{"location":"advanced_usage/control_filtering/icon_filtering/#icon-filter","text":"The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.","title":"Icon Filter"},{"location":"advanced_usage/control_filtering/icon_filtering/#configuration","text":"To activate the icon control filtering, you need to add ICON to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed icon control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON to the list. CONTROL_FILTER_TOP_K_ICON : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_ICON_NAME : The control filter model name for icon similarity. By default, it is set to \"clip-ViT-B-32\".","title":"Configuration"},{"location":"advanced_usage/control_filtering/icon_filtering/#reference","text":"Bases: BasicControlFilter A class that represents a icon model for control filtering.","title":"Reference"},{"location":"advanced_usage/control_filtering/icon_filtering/#automator.ui_control.control_filter.IconControlFilter.control_filter","text":"Filters control items based on their scores and returns the top-k items. Parameters: control_dicts \u2013 The dictionary of all control items. cropped_icons_dict \u2013 The dictionary of the cropped icons. plans \u2013 The plans to compare the control icons against. top_k \u2013 The number of top items to return. Returns: \u2013 The list of top-k control items based on their scores. Source code in automator/ui_control/control_filter.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 def control_filter ( self , control_dicts , cropped_icons_dict , plans , top_k ): \"\"\" Filters control items based on their scores and returns the top-k items. :param control_dicts: The dictionary of all control items. :param cropped_icons_dict: The dictionary of the cropped icons. :param plans: The plans to compare the control icons against. :param top_k: The number of top items to return. :return: The list of top-k control items based on their scores. \"\"\" scores_items = [] filtered_control_dict = {} for label , cropped_icon in cropped_icons_dict . items (): score = self . control_filter_score ( cropped_icon , plans ) scores_items . append (( score , label )) topk_scores_items = heapq . nlargest ( top_k , scores_items , key = lambda x : x [ 0 ]) topk_labels = [ scores_items [ 1 ] for scores_items in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_labels : filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/control_filtering/icon_filtering/#automator.ui_control.control_filter.IconControlFilter.control_filter_score","text":"Calculates the score of a control icon based on its similarity to the given keywords. Parameters: control_icon \u2013 The control icon image. plans \u2013 The plan to compare the control icon against. Returns: \u2013 The maximum similarity score between the control icon and the keywords. Source code in automator/ui_control/control_filter.py 240 241 242 243 244 245 246 247 248 249 250 def control_filter_score ( self , control_icon , plans ): \"\"\" Calculates the score of a control icon based on its similarity to the given keywords. :param control_icon: The control icon image. :param plans: The plan to compare the control icon against. :return: The maximum similarity score between the control icon and the keywords. \"\"\" plans_embedding = self . get_embedding ( plans ) control_icon_embedding = self . get_embedding ( control_icon ) return max ( self . cos_sim ( control_icon_embedding , plans_embedding ) . tolist ()[ 0 ])","title":"control_filter_score"},{"location":"advanced_usage/control_filtering/overview/","text":"Control Filtering There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task. Execept for configuring the control types for selection on CONTROL_LIST in config_dev.yaml , UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods: Filtering Method Description Text Filter the controls based on the control text. Semantic Filter the controls based on the semantic similarity. Icon Filter the controls based on the control icon image. Configuration You can activate the control filtering by setting the CONTROL_FILTER in the config_dev.yaml file. The CONTROL_FILTER is a list of filtering methods that you want to apply to the controls, which can be TEXT , SEMANTIC , or ICON . You can configure multiple filtering methods in the CONTROL_FILTER list. Reference The implementation of the control filtering is base on the BasicControlFilter class located in the ufo/automator/ui_control/control_filter.py file. Concrete filtering class inherit from the BasicControlFilter class and implement the control_filter method to filter the controls based on the specific filtering method. BasicControlFilter represents a model for filtering control items. __new__ ( model_path ) Creates a new instance of BasicControlFilter. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The BasicControlFilter instance. Source code in automator/ui_control/control_filter.py 72 73 74 75 76 77 78 79 80 81 82 def __new__ ( cls , model_path ): \"\"\" Creates a new instance of BasicControlFilter. :param model_path: The path to the model. :return: The BasicControlFilter instance. \"\"\" if model_path not in cls . _instances : instance = super ( BasicControlFilter , cls ) . __new__ ( cls ) instance . model = cls . load_model ( model_path ) cls . _instances [ model_path ] = instance return cls . _instances [ model_path ] control_filter ( control_dicts , plans , ** kwargs ) abstractmethod Calculates the cosine similarity between the embeddings of the given keywords and the control item. Parameters: control_dicts \u2013 The control item to be compared with the plans. plans \u2013 The plans to be used for calculating the similarity. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 104 105 106 107 108 109 110 111 112 @abstractmethod def control_filter ( self , control_dicts , plans , ** kwargs ): \"\"\" Calculates the cosine similarity between the embeddings of the given keywords and the control item. :param control_dicts: The control item to be compared with the plans. :param plans: The plans to be used for calculating the similarity. :return: The filtered control items. \"\"\" pass cos_sim ( embedding1 , embedding2 ) staticmethod Computes the cosine similarity between two embeddings. Parameters: embedding1 \u2013 The first embedding. embedding2 \u2013 The second embedding. Returns: float \u2013 The cosine similarity between the two embeddings. Source code in automator/ui_control/control_filter.py 153 154 155 156 157 158 159 160 161 162 163 @staticmethod def cos_sim ( embedding1 , embedding2 ) -> float : \"\"\" Computes the cosine similarity between two embeddings. :param embedding1: The first embedding. :param embedding2: The second embedding. :return: The cosine similarity between the two embeddings. \"\"\" import sentence_transformers return sentence_transformers . util . cos_sim ( embedding1 , embedding2 ) get_embedding ( content ) Encodes the given object into an embedding. Parameters: content \u2013 The content to encode. Returns: \u2013 The embedding of the object. Source code in automator/ui_control/control_filter.py 95 96 97 98 99 100 101 102 def get_embedding ( self , content ): \"\"\" Encodes the given object into an embedding. :param content: The content to encode. :return: The embedding of the object. \"\"\" return self . model . encode ( content ) load_model ( model_path ) staticmethod Loads the model from the given model path. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The loaded model. Source code in automator/ui_control/control_filter.py 84 85 86 87 88 89 90 91 92 93 @staticmethod def load_model ( model_path ): \"\"\" Loads the model from the given model path. :param model_path: The path to the model. :return: The loaded model. \"\"\" import sentence_transformers return sentence_transformers . SentenceTransformer ( model_path ) plans_to_keywords ( plans ) staticmethod Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. Parameters: plans ( List [ str ] ) \u2013 The plan to be parsed. Returns: List [ str ] \u2013 A list of keywords extracted from the plan. Source code in automator/ui_control/control_filter.py 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 @staticmethod def plans_to_keywords ( plans : List [ str ]) -> List [ str ]: \"\"\" Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. :param plans: The plan to be parsed. :return: A list of keywords extracted from the plan. \"\"\" keywords = [] for plan in plans : words = plan . replace ( \"'\" , \"\" ) . strip ( \".\" ) . split () words = [ word for word in words if word . isalpha () or bool ( re . fullmatch ( r \"[\\u4e00-\\u9fa5]+\" , word )) ] keywords . extend ( words ) return keywords remove_stopwords ( keywords ) staticmethod Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). Parameters: keywords \u2013 The list of keywords to be filtered. Returns: \u2013 The list of keywords with the stopwords removed. Source code in automator/ui_control/control_filter.py 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 @staticmethod def remove_stopwords ( keywords ): \"\"\" Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). :param keywords: The list of keywords to be filtered. :return: The list of keywords with the stopwords removed. \"\"\" try : from nltk.corpus import stopwords stopwords_list = stopwords . words ( \"english\" ) except LookupError as e : import nltk nltk . download ( \"stopwords\" ) stopwords_list = nltk . corpus . stopwords . words ( \"english\" ) return [ keyword for keyword in keywords if keyword in stopwords_list ]","title":"Overview"},{"location":"advanced_usage/control_filtering/overview/#control-filtering","text":"There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task. Execept for configuring the control types for selection on CONTROL_LIST in config_dev.yaml , UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods: Filtering Method Description Text Filter the controls based on the control text. Semantic Filter the controls based on the semantic similarity. Icon Filter the controls based on the control icon image.","title":"Control Filtering"},{"location":"advanced_usage/control_filtering/overview/#configuration","text":"You can activate the control filtering by setting the CONTROL_FILTER in the config_dev.yaml file. The CONTROL_FILTER is a list of filtering methods that you want to apply to the controls, which can be TEXT , SEMANTIC , or ICON . You can configure multiple filtering methods in the CONTROL_FILTER list.","title":"Configuration"},{"location":"advanced_usage/control_filtering/overview/#reference","text":"The implementation of the control filtering is base on the BasicControlFilter class located in the ufo/automator/ui_control/control_filter.py file. Concrete filtering class inherit from the BasicControlFilter class and implement the control_filter method to filter the controls based on the specific filtering method. BasicControlFilter represents a model for filtering control items.","title":"Reference"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.__new__","text":"Creates a new instance of BasicControlFilter. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The BasicControlFilter instance. Source code in automator/ui_control/control_filter.py 72 73 74 75 76 77 78 79 80 81 82 def __new__ ( cls , model_path ): \"\"\" Creates a new instance of BasicControlFilter. :param model_path: The path to the model. :return: The BasicControlFilter instance. \"\"\" if model_path not in cls . _instances : instance = super ( BasicControlFilter , cls ) . __new__ ( cls ) instance . model = cls . load_model ( model_path ) cls . _instances [ model_path ] = instance return cls . _instances [ model_path ]","title":"__new__"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.control_filter","text":"Calculates the cosine similarity between the embeddings of the given keywords and the control item. Parameters: control_dicts \u2013 The control item to be compared with the plans. plans \u2013 The plans to be used for calculating the similarity. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 104 105 106 107 108 109 110 111 112 @abstractmethod def control_filter ( self , control_dicts , plans , ** kwargs ): \"\"\" Calculates the cosine similarity between the embeddings of the given keywords and the control item. :param control_dicts: The control item to be compared with the plans. :param plans: The plans to be used for calculating the similarity. :return: The filtered control items. \"\"\" pass","title":"control_filter"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.cos_sim","text":"Computes the cosine similarity between two embeddings. Parameters: embedding1 \u2013 The first embedding. embedding2 \u2013 The second embedding. Returns: float \u2013 The cosine similarity between the two embeddings. Source code in automator/ui_control/control_filter.py 153 154 155 156 157 158 159 160 161 162 163 @staticmethod def cos_sim ( embedding1 , embedding2 ) -> float : \"\"\" Computes the cosine similarity between two embeddings. :param embedding1: The first embedding. :param embedding2: The second embedding. :return: The cosine similarity between the two embeddings. \"\"\" import sentence_transformers return sentence_transformers . util . cos_sim ( embedding1 , embedding2 )","title":"cos_sim"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.get_embedding","text":"Encodes the given object into an embedding. Parameters: content \u2013 The content to encode. Returns: \u2013 The embedding of the object. Source code in automator/ui_control/control_filter.py 95 96 97 98 99 100 101 102 def get_embedding ( self , content ): \"\"\" Encodes the given object into an embedding. :param content: The content to encode. :return: The embedding of the object. \"\"\" return self . model . encode ( content )","title":"get_embedding"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.load_model","text":"Loads the model from the given model path. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The loaded model. Source code in automator/ui_control/control_filter.py 84 85 86 87 88 89 90 91 92 93 @staticmethod def load_model ( model_path ): \"\"\" Loads the model from the given model path. :param model_path: The path to the model. :return: The loaded model. \"\"\" import sentence_transformers return sentence_transformers . SentenceTransformer ( model_path )","title":"load_model"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.plans_to_keywords","text":"Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. Parameters: plans ( List [ str ] ) \u2013 The plan to be parsed. Returns: List [ str ] \u2013 A list of keywords extracted from the plan. Source code in automator/ui_control/control_filter.py 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 @staticmethod def plans_to_keywords ( plans : List [ str ]) -> List [ str ]: \"\"\" Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. :param plans: The plan to be parsed. :return: A list of keywords extracted from the plan. \"\"\" keywords = [] for plan in plans : words = plan . replace ( \"'\" , \"\" ) . strip ( \".\" ) . split () words = [ word for word in words if word . isalpha () or bool ( re . fullmatch ( r \"[\\u4e00-\\u9fa5]+\" , word )) ] keywords . extend ( words ) return keywords","title":"plans_to_keywords"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.remove_stopwords","text":"Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). Parameters: keywords \u2013 The list of keywords to be filtered. Returns: \u2013 The list of keywords with the stopwords removed. Source code in automator/ui_control/control_filter.py 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 @staticmethod def remove_stopwords ( keywords ): \"\"\" Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). :param keywords: The list of keywords to be filtered. :return: The list of keywords with the stopwords removed. \"\"\" try : from nltk.corpus import stopwords stopwords_list = stopwords . words ( \"english\" ) except LookupError as e : import nltk nltk . download ( \"stopwords\" ) stopwords_list = nltk . corpus . stopwords . words ( \"english\" ) return [ keyword for keyword in keywords if keyword in stopwords_list ]","title":"remove_stopwords"},{"location":"advanced_usage/control_filtering/semantic_filtering/","text":"Sematic Control Filter The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings. Configuration To activate the semantic control filtering, you need to add SEMANTIC to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed sematic control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC to the list. CONTROL_FILTER_TOP_K_SEMANTIC : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_SEMANTIC_NAME : The control filter model name for semantic similarity. By default, it is set to \"all-MiniLM-L6-v2\". Reference Bases: BasicControlFilter A class that represents a semantic model for control filtering. control_filter ( control_dicts , plans , top_k ) Filters control items based on their similarity to a set of keywords. Parameters: control_dicts \u2013 The dictionary of control items to be filtered. plans \u2013 The list of plans to be used for filtering. top_k \u2013 The number of top control items to return. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 def control_filter ( self , control_dicts , plans , top_k ): \"\"\" Filters control items based on their similarity to a set of keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :param top_k: The number of top control items to return. :return: The filtered control items. \"\"\" scores_items = [] filtered_control_dict = {} for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () score = self . control_filter_score ( control_text , plans ) scores_items . append (( label , score )) topk_scores_items = heapq . nlargest ( top_k , ( scores_items ), key = lambda x : x [ 1 ]) topk_items = [ ( score_item [ 0 ], score_item [ 1 ]) for score_item in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_items : filtered_control_dict [ label ] = control_item return filtered_control_dict control_filter_score ( control_text , plans ) Calculates the score for a control item based on the similarity between its text and a set of keywords. Parameters: control_text \u2013 The text of the control item. plans \u2013 The plan to be used for calculating the similarity. Returns: \u2013 The score (0-1) indicating the similarity between the control text and the keywords. Source code in automator/ui_control/control_filter.py 197 198 199 200 201 202 203 204 205 206 207 def control_filter_score ( self , control_text , plans ): \"\"\" Calculates the score for a control item based on the similarity between its text and a set of keywords. :param control_text: The text of the control item. :param plans: The plan to be used for calculating the similarity. :return: The score (0-1) indicating the similarity between the control text and the keywords. \"\"\" plan_embedding = self . get_embedding ( plans ) control_text_embedding = self . get_embedding ( control_text ) return max ( self . cos_sim ( control_text_embedding , plan_embedding ) . tolist ()[ 0 ])","title":"Semantic Filtering"},{"location":"advanced_usage/control_filtering/semantic_filtering/#sematic-control-filter","text":"The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings.","title":"Sematic Control Filter"},{"location":"advanced_usage/control_filtering/semantic_filtering/#configuration","text":"To activate the semantic control filtering, you need to add SEMANTIC to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed sematic control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC to the list. CONTROL_FILTER_TOP_K_SEMANTIC : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_SEMANTIC_NAME : The control filter model name for semantic similarity. By default, it is set to \"all-MiniLM-L6-v2\".","title":"Configuration"},{"location":"advanced_usage/control_filtering/semantic_filtering/#reference","text":"Bases: BasicControlFilter A class that represents a semantic model for control filtering.","title":"Reference"},{"location":"advanced_usage/control_filtering/semantic_filtering/#automator.ui_control.control_filter.SemanticControlFilter.control_filter","text":"Filters control items based on their similarity to a set of keywords. Parameters: control_dicts \u2013 The dictionary of control items to be filtered. plans \u2013 The list of plans to be used for filtering. top_k \u2013 The number of top control items to return. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 def control_filter ( self , control_dicts , plans , top_k ): \"\"\" Filters control items based on their similarity to a set of keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :param top_k: The number of top control items to return. :return: The filtered control items. \"\"\" scores_items = [] filtered_control_dict = {} for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () score = self . control_filter_score ( control_text , plans ) scores_items . append (( label , score )) topk_scores_items = heapq . nlargest ( top_k , ( scores_items ), key = lambda x : x [ 1 ]) topk_items = [ ( score_item [ 0 ], score_item [ 1 ]) for score_item in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_items : filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/control_filtering/semantic_filtering/#automator.ui_control.control_filter.SemanticControlFilter.control_filter_score","text":"Calculates the score for a control item based on the similarity between its text and a set of keywords. Parameters: control_text \u2013 The text of the control item. plans \u2013 The plan to be used for calculating the similarity. Returns: \u2013 The score (0-1) indicating the similarity between the control text and the keywords. Source code in automator/ui_control/control_filter.py 197 198 199 200 201 202 203 204 205 206 207 def control_filter_score ( self , control_text , plans ): \"\"\" Calculates the score for a control item based on the similarity between its text and a set of keywords. :param control_text: The text of the control item. :param plans: The plan to be used for calculating the similarity. :return: The score (0-1) indicating the similarity between the control text and the keywords. \"\"\" plan_embedding = self . get_embedding ( plans ) control_text_embedding = self . get_embedding ( control_text ) return max ( self . cos_sim ( control_text_embedding , plan_embedding ) . tolist ()[ 0 ])","title":"control_filter_score"},{"location":"advanced_usage/control_filtering/text_filtering/","text":"Text Control Filter The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan. Configuration To activate the text control filtering, you need to add TEXT to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed text control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT to the list. CONTROL_FILTER_TOP_K_PLAN : The number of agent's plan keywords or phrases to use for filtering the controls. Reference A class that provides methods for filtering control items based on plans. control_filter ( control_dicts , plans ) staticmethod Filters control items based on keywords. Parameters: control_dicts ( Dict ) \u2013 The dictionary of control items to be filtered. plans ( List [ str ] ) \u2013 The list of plans to be used for filtering. Returns: Dict \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 @staticmethod def control_filter ( control_dicts : Dict , plans : List [ str ]) -> Dict : \"\"\" Filters control items based on keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :return: The filtered control items. \"\"\" filtered_control_dict = {} keywords = BasicControlFilter . plans_to_keywords ( plans ) for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () if any ( keyword in control_text or control_text in keyword for keyword in keywords ): filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"Text Filtering"},{"location":"advanced_usage/control_filtering/text_filtering/#text-control-filter","text":"The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.","title":"Text Control Filter"},{"location":"advanced_usage/control_filtering/text_filtering/#configuration","text":"To activate the text control filtering, you need to add TEXT to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed text control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT to the list. CONTROL_FILTER_TOP_K_PLAN : The number of agent's plan keywords or phrases to use for filtering the controls.","title":"Configuration"},{"location":"advanced_usage/control_filtering/text_filtering/#reference","text":"A class that provides methods for filtering control items based on plans.","title":"Reference"},{"location":"advanced_usage/control_filtering/text_filtering/#automator.ui_control.control_filter.TextControlFilter.control_filter","text":"Filters control items based on keywords. Parameters: control_dicts ( Dict ) \u2013 The dictionary of control items to be filtered. plans ( List [ str ] ) \u2013 The list of plans to be used for filtering. Returns: Dict \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 @staticmethod def control_filter ( control_dicts : Dict , plans : List [ str ]) -> Dict : \"\"\" Filters control items based on keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :return: The filtered control items. \"\"\" filtered_control_dict = {} keywords = BasicControlFilter . plans_to_keywords ( plans ) for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () if any ( keyword in control_text or control_text in keyword for keyword in keywords ): filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/reinforce_appagent/experience_learning/","text":"Learning from Self-Experience When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future. Mechanism Step 1: Complete a Session Event : UFO completes a session Step 2: Ask User to Save Experience Action : The agent prompts the user with a choice to save the successful experience Step 3: User Chooses to Save Action : If the user chooses to save the experience Step 4: Summarize and Save the Experience Tool : ExperienceSummarizer Process : Summarize the experience into a demonstration example Save the demonstration example in the EXPERIENCE_SAVED_PATH as specified in the config_dev.yaml file The demonstration example includes similar fields as those used in the AppAgent's prompt Step 5: Retrieve and Utilize Saved Experience When : The AppAgent encounters a similar task in the future Action : Retrieve the saved experience from the experience database Outcome : Use the retrieved experience to generate a plan Workflow Diagram graph TD; A[Complete Session] --> B[Ask User to Save Experience] B --> C[User Chooses to Save] C --> D[Summarize with ExperienceSummarizer] D --> E[Save in EXPERIENCE_SAVED_PATH] F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience] G --> H[Generate Plan] Activate the Learning from Self-Experience Step 1: Configure the AppAgent Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 Reference Experience Summarizer The ExperienceSummarizer class is located in the ufo/experience/experience_summarizer.py file. The ExperienceSummarizer class provides the following methods to summarize the experience: The ExperienceSummarizer class is the summarizer for the experience learning. Initialize the ApplicationAgentPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in experience/summarizer.py 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , ): \"\"\" Initialize the ApplicationAgentPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . example_prompt_template = example_prompt_template self . api_prompt_template = api_prompt_template build_prompt ( log_partition ) Build the prompt. Parameters: log_partition ( dict ) \u2013 The log partition. return: The prompt. Source code in experience/summarizer.py 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def build_prompt ( self , log_partition : dict ) -> list : \"\"\" Build the prompt. :param log_partition: The log partition. return: The prompt. \"\"\" experience_prompter = ExperiencePrompter ( self . is_visual , self . prompt_template , self . example_prompt_template , self . api_prompt_template , ) experience_system_prompt = experience_prompter . system_prompt_construction () experience_user_prompt = experience_prompter . user_content_construction ( log_partition ) experience_prompt = experience_prompter . prompt_construction ( experience_system_prompt , experience_user_prompt ) return experience_prompt create_or_update_vector_db ( summaries , db_path ) staticmethod Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in experience/summarizer.py 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" ) create_or_update_yaml ( summaries , yaml_path ) staticmethod Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in experience/summarizer.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" ) get_summary ( prompt_message ) Get the summary. Parameters: prompt_message ( list ) \u2013 The prompt message. return: The summary and the cost. Source code in experience/summarizer.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_summary ( self , prompt_message : list ) -> Tuple [ dict , float ]: \"\"\" Get the summary. :param prompt_message: The prompt message. return: The summary and the cost. \"\"\" # Get the completion for the prompt message response_string , cost = get_completion ( prompt_message , \"APPAGENT\" , use_backup_engine = True ) try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary , cost get_summary_list ( logs ) Get the summary list. Parameters: logs ( list ) \u2013 The logs. return: The summary list and the total cost. Source code in experience/summarizer.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 def get_summary_list ( self , logs : list ) -> Tuple [ list , float ]: \"\"\" Get the summary list. :param logs: The logs. return: The summary list and the total cost. \"\"\" summaries = [] total_cost = 0.0 for log_partition in logs : prompt = self . build_prompt ( log_partition ) summary , cost = self . get_summary ( prompt ) summary [ \"request\" ] = ExperienceLogLoader . get_user_request ( log_partition ) summary [ \"app_list\" ] = ExperienceLogLoader . get_app_list ( log_partition ) summaries . append ( summary ) total_cost += cost return summaries , total_cost read_logs ( log_path ) staticmethod Read the log. Parameters: log_path ( str ) \u2013 The path of the log file. Source code in experience/summarizer.py 117 118 119 120 121 122 123 124 125 @staticmethod def read_logs ( log_path : str ) -> list : \"\"\" Read the log. :param log_path: The path of the log file. \"\"\" replay_loader = ExperienceLogLoader ( log_path ) logs = replay_loader . create_logs () return logs Experience Retriever The ExperienceRetriever class is located in the ufo/rag/retriever.py file. The ExperienceRetriever class provides the following methods to retrieve the experience: Bases: Retriever Class to create experience retrievers. Create a new ExperienceRetriever. Parameters: db_path \u2013 The path to the database. Source code in rag/retriever.py 131 132 133 134 135 136 def __init__ ( self , db_path ) -> None : \"\"\" Create a new ExperienceRetriever. :param db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path ) get_indexer ( db_path ) Create an experience indexer. Parameters: db_path ( str ) \u2013 The path to the database. Source code in rag/retriever.py 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 def get_indexer ( self , db_path : str ): \"\"\" Create an experience indexer. :param db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load experience indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"Experience Learning"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#learning-from-self-experience","text":"When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.","title":"Learning from Self-Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#mechanism","text":"","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-1-complete-a-session","text":"Event : UFO completes a session","title":"Step 1: Complete a Session"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-2-ask-user-to-save-experience","text":"Action : The agent prompts the user with a choice to save the successful experience","title":"Step 2: Ask User to Save Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-3-user-chooses-to-save","text":"Action : If the user chooses to save the experience","title":"Step 3: User Chooses to Save"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-4-summarize-and-save-the-experience","text":"Tool : ExperienceSummarizer Process : Summarize the experience into a demonstration example Save the demonstration example in the EXPERIENCE_SAVED_PATH as specified in the config_dev.yaml file The demonstration example includes similar fields as those used in the AppAgent's prompt","title":"Step 4: Summarize and Save the Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-5-retrieve-and-utilize-saved-experience","text":"When : The AppAgent encounters a similar task in the future Action : Retrieve the saved experience from the experience database Outcome : Use the retrieved experience to generate a plan","title":"Step 5: Retrieve and Utilize Saved Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#workflow-diagram","text":"graph TD; A[Complete Session] --> B[Ask User to Save Experience] B --> C[User Chooses to Save] C --> D[Summarize with ExperienceSummarizer] D --> E[Save in EXPERIENCE_SAVED_PATH] F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience] G --> H[Generate Plan]","title":"Workflow Diagram"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#activate-the-learning-from-self-experience","text":"","title":"Activate the Learning from Self-Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-1-configure-the-appagent","text":"Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5","title":"Step 1: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#reference","text":"","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience-summarizer","text":"The ExperienceSummarizer class is located in the ufo/experience/experience_summarizer.py file. The ExperienceSummarizer class provides the following methods to summarize the experience: The ExperienceSummarizer class is the summarizer for the experience learning. Initialize the ApplicationAgentPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in experience/summarizer.py 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , ): \"\"\" Initialize the ApplicationAgentPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . example_prompt_template = example_prompt_template self . api_prompt_template = api_prompt_template","title":"Experience Summarizer"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.build_prompt","text":"Build the prompt. Parameters: log_partition ( dict ) \u2013 The log partition. return: The prompt. Source code in experience/summarizer.py 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def build_prompt ( self , log_partition : dict ) -> list : \"\"\" Build the prompt. :param log_partition: The log partition. return: The prompt. \"\"\" experience_prompter = ExperiencePrompter ( self . is_visual , self . prompt_template , self . example_prompt_template , self . api_prompt_template , ) experience_system_prompt = experience_prompter . system_prompt_construction () experience_user_prompt = experience_prompter . user_content_construction ( log_partition ) experience_prompt = experience_prompter . prompt_construction ( experience_system_prompt , experience_user_prompt ) return experience_prompt","title":"build_prompt"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.create_or_update_vector_db","text":"Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in experience/summarizer.py 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" )","title":"create_or_update_vector_db"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.create_or_update_yaml","text":"Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in experience/summarizer.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" )","title":"create_or_update_yaml"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.get_summary","text":"Get the summary. Parameters: prompt_message ( list ) \u2013 The prompt message. return: The summary and the cost. Source code in experience/summarizer.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_summary ( self , prompt_message : list ) -> Tuple [ dict , float ]: \"\"\" Get the summary. :param prompt_message: The prompt message. return: The summary and the cost. \"\"\" # Get the completion for the prompt message response_string , cost = get_completion ( prompt_message , \"APPAGENT\" , use_backup_engine = True ) try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary , cost","title":"get_summary"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.get_summary_list","text":"Get the summary list. Parameters: logs ( list ) \u2013 The logs. return: The summary list and the total cost. Source code in experience/summarizer.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 def get_summary_list ( self , logs : list ) -> Tuple [ list , float ]: \"\"\" Get the summary list. :param logs: The logs. return: The summary list and the total cost. \"\"\" summaries = [] total_cost = 0.0 for log_partition in logs : prompt = self . build_prompt ( log_partition ) summary , cost = self . get_summary ( prompt ) summary [ \"request\" ] = ExperienceLogLoader . get_user_request ( log_partition ) summary [ \"app_list\" ] = ExperienceLogLoader . get_app_list ( log_partition ) summaries . append ( summary ) total_cost += cost return summaries , total_cost","title":"get_summary_list"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.read_logs","text":"Read the log. Parameters: log_path ( str ) \u2013 The path of the log file. Source code in experience/summarizer.py 117 118 119 120 121 122 123 124 125 @staticmethod def read_logs ( log_path : str ) -> list : \"\"\" Read the log. :param log_path: The path of the log file. \"\"\" replay_loader = ExperienceLogLoader ( log_path ) logs = replay_loader . create_logs () return logs","title":"read_logs"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience-retriever","text":"The ExperienceRetriever class is located in the ufo/rag/retriever.py file. The ExperienceRetriever class provides the following methods to retrieve the experience: Bases: Retriever Class to create experience retrievers. Create a new ExperienceRetriever. Parameters: db_path \u2013 The path to the database. Source code in rag/retriever.py 131 132 133 134 135 136 def __init__ ( self , db_path ) -> None : \"\"\" Create a new ExperienceRetriever. :param db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path )","title":"Experience Retriever"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#rag.retriever.ExperienceRetriever.get_indexer","text":"Create an experience indexer. Parameters: db_path ( str ) \u2013 The path to the database. Source code in rag/retriever.py 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 def get_indexer ( self , db_path : str ): \"\"\" Create an experience indexer. :param db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load experience indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/","text":"Learning from Bing Search UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent 's knowledge. Mechanism Upon receiving a request, the AppAgent constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information. Activate the Learning from Bing Search Step 1: Obtain Bing API Key To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key. Step 2: Configure the AppAgent Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1 Reference Bases: Retriever Class to create online retrievers. Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. Source code in rag/retriever.py 162 163 164 165 166 167 168 169 def __init__ ( self , query : str , top_k : int ) -> None : \"\"\" Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. \"\"\" self . query = query self . indexer = self . get_indexer ( top_k ) get_indexer ( top_k ) Create an online search indexer. Parameters: top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The created indexer. Source code in rag/retriever.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 def get_indexer ( self , top_k : int ): \"\"\" Create an online search indexer. :param top_k: The number of documents to retrieve. :return: The created indexer. \"\"\" bing_retriever = web_search . BingSearchWeb () result_list = bing_retriever . search ( self . query , top_k = top_k ) documents = bing_retriever . create_documents ( result_list ) if len ( documents ) == 0 : return None indexer = bing_retriever . create_indexer ( documents ) print_with_color ( \"Online indexer created successfully for {num} searched results.\" . format ( num = len ( documents ) ), \"cyan\" , ) return indexer","title":"Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#learning-from-bing-search","text":"UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent 's knowledge.","title":"Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#mechanism","text":"Upon receiving a request, the AppAgent constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#activate-the-learning-from-bing-search","text":"","title":"Activate the Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#step-1-obtain-bing-api-key","text":"To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key.","title":"Step 1: Obtain Bing API Key"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#step-2-configure-the-appagent","text":"Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#reference","text":"Bases: Retriever Class to create online retrievers. Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. Source code in rag/retriever.py 162 163 164 165 166 167 168 169 def __init__ ( self , query : str , top_k : int ) -> None : \"\"\" Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. \"\"\" self . query = query self . indexer = self . get_indexer ( top_k )","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#rag.retriever.OnlineDocRetriever.get_indexer","text":"Create an online search indexer. Parameters: top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The created indexer. Source code in rag/retriever.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 def get_indexer ( self , top_k : int ): \"\"\" Create an online search indexer. :param top_k: The number of documents to retrieve. :return: The created indexer. \"\"\" bing_retriever = web_search . BingSearchWeb () result_list = bing_retriever . search ( self . query , top_k = top_k ) documents = bing_retriever . create_documents ( result_list ) if len ( documents ) == 0 : return None indexer = bing_retriever . create_indexer ( documents ) print_with_color ( \"Online indexer created successfully for {num} searched results.\" . format ( num = len ( documents ) ), \"cyan\" , ) return indexer","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/","text":"Here is the polished document for your Python code project: Learning from User Demonstration For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance. Mechanism UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH as specified in the config_dev.yaml file. When the AppAgent encounters a similar task, the DemonstrationRetriever class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration. Info You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document. You can find a demo video of learning from user demonstrations: Activating Learning from User Demonstrations Step 1: User Demonstration Please follow the steps in the User Demonstration Provision document to provide user demonstrations. Step 2: Configure the AppAgent Configure the following parameters to allow UFO to use RAG from user demonstrations: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use RAG from user demonstrations Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The top K documents to retrieve offline Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Reference Demonstration Summarizer The DemonstrationSummarizer class is located in the record_processor/summarizer/summarizer.py file. The DemonstrationSummarizer class provides methods to summarize the demonstration: The DemonstrationSummarizer class is the summarizer for the demonstration learning. It summarizes the demonstration record to a list of summaries, and save the summaries to the YAML file and the vector database. A sample of the summary is as follows: { \"example\": { \"Observation\": \"Word.exe is opened.\", \"Thought\": \"The user is trying to create a new file.\", \"ControlLabel\": \"1\", \"ControlText\": \"Sample Control Text\", \"Function\": \"CreateFile\", \"Args\": \"filename='new_file.txt'\", \"Status\": \"Success\", \"Plan\": \"Create a new file named 'new_file.txt'.\", \"Comment\": \"The user successfully created a new file.\" }, \"Tips\": \"You can use the 'CreateFile' function to create a new file.\" } Initialize the DemonstrationSummarizer. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. demonstration_prompt_template ( str ) \u2013 The path of the example prompt template for demonstration. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in summarizer/summarizer.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def __init__ ( self , is_visual : bool , prompt_template : str , demonstration_prompt_template : str , api_prompt_template : str , completion_num : int = 1 , ): \"\"\" Initialize the DemonstrationSummarizer. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param demonstration_prompt_template: The path of the example prompt template for demonstration. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . demonstration_prompt_template = demonstration_prompt_template self . api_prompt_template = api_prompt_template self . completion_num = completion_num __build_prompt ( demo_record ) Build the prompt by the user demonstration record. Parameters: demo_record ( DemonstrationRecord ) \u2013 The user demonstration record. return: The prompt. Source code in summarizer/summarizer.py 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 def __build_prompt ( self , demo_record : DemonstrationRecord ) -> list : \"\"\" Build the prompt by the user demonstration record. :param demo_record: The user demonstration record. return: The prompt. \"\"\" demonstration_prompter = DemonstrationPrompter ( self . is_visual , self . prompt_template , self . demonstration_prompt_template , self . api_prompt_template , ) demonstration_system_prompt = ( demonstration_prompter . system_prompt_construction () ) demonstration_user_prompt = demonstration_prompter . user_content_construction ( demo_record ) demonstration_prompt = demonstration_prompter . prompt_construction ( demonstration_system_prompt , demonstration_user_prompt ) return demonstration_prompt __parse_response ( response_string ) Parse the response string to a dict of summary. Parameters: response_string ( str ) \u2013 The response string. return: The summary dict. Source code in summarizer/summarizer.py 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 def __parse_response ( self , response_string : str ) -> dict : \"\"\" Parse the response string to a dict of summary. :param response_string: The response string. return: The summary dict. \"\"\" try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response, in case any of the keys are missing, set them to empty string. if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary create_or_update_vector_db ( summaries , db_path ) staticmethod Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in summarizer/summarizer.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" ) create_or_update_yaml ( summaries , yaml_path ) staticmethod Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in summarizer/summarizer.py 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" ) get_summary_list ( record ) Get the summary list for a record Parameters: record ( DemonstrationRecord ) \u2013 The demonstration record. return: The summary list for the user defined completion number and the cost Source code in summarizer/summarizer.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 def get_summary_list ( self , record : DemonstrationRecord ) -> Tuple [ list , float ]: \"\"\" Get the summary list for a record :param record: The demonstration record. return: The summary list for the user defined completion number and the cost \"\"\" prompt = self . __build_prompt ( record ) response_string_list , cost = get_completions ( prompt , \"APPAGENT\" , use_backup_engine = True , n = self . completion_num ) summaries = [] for response_string in response_string_list : summary = self . __parse_response ( response_string ) if summary : summary [ \"request\" ] = record . get_request () summary [ \"app_list\" ] = record . get_applications () summaries . append ( summary ) return summaries , cost Demonstration Retriever The DemonstrationRetriever class is located in the rag/retriever.py file. The DemonstrationRetriever class provides methods to retrieve the demonstration: Bases: Retriever Class to create demonstration retrievers. Create a new DemonstrationRetriever. :db_path: The path to the database. Source code in rag/retriever.py 198 199 200 201 202 203 def __init__ ( self , db_path ) -> None : \"\"\" Create a new DemonstrationRetriever. :db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path ) get_indexer ( db_path ) Create a demonstration indexer. :db_path: The path to the database. Source code in rag/retriever.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 def get_indexer ( self , db_path : str ): \"\"\" Create a demonstration indexer. :db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load demonstration indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"Learning from User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#learning-from-user-demonstration","text":"For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance.","title":"Learning from User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#mechanism","text":"UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH as specified in the config_dev.yaml file. When the AppAgent encounters a similar task, the DemonstrationRetriever class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration. Info You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document. You can find a demo video of learning from user demonstrations:","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#activating-learning-from-user-demonstrations","text":"","title":"Activating Learning from User Demonstrations"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#step-1-user-demonstration","text":"Please follow the steps in the User Demonstration Provision document to provide user demonstrations.","title":"Step 1: User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#step-2-configure-the-appagent","text":"Configure the following parameters to allow UFO to use RAG from user demonstrations: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use RAG from user demonstrations Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The top K documents to retrieve offline Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#reference","text":"","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#demonstration-summarizer","text":"The DemonstrationSummarizer class is located in the record_processor/summarizer/summarizer.py file. The DemonstrationSummarizer class provides methods to summarize the demonstration: The DemonstrationSummarizer class is the summarizer for the demonstration learning. It summarizes the demonstration record to a list of summaries, and save the summaries to the YAML file and the vector database. A sample of the summary is as follows: { \"example\": { \"Observation\": \"Word.exe is opened.\", \"Thought\": \"The user is trying to create a new file.\", \"ControlLabel\": \"1\", \"ControlText\": \"Sample Control Text\", \"Function\": \"CreateFile\", \"Args\": \"filename='new_file.txt'\", \"Status\": \"Success\", \"Plan\": \"Create a new file named 'new_file.txt'.\", \"Comment\": \"The user successfully created a new file.\" }, \"Tips\": \"You can use the 'CreateFile' function to create a new file.\" } Initialize the DemonstrationSummarizer. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. demonstration_prompt_template ( str ) \u2013 The path of the example prompt template for demonstration. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in summarizer/summarizer.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def __init__ ( self , is_visual : bool , prompt_template : str , demonstration_prompt_template : str , api_prompt_template : str , completion_num : int = 1 , ): \"\"\" Initialize the DemonstrationSummarizer. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param demonstration_prompt_template: The path of the example prompt template for demonstration. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . demonstration_prompt_template = demonstration_prompt_template self . api_prompt_template = api_prompt_template self . completion_num = completion_num","title":"Demonstration Summarizer"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.__build_prompt","text":"Build the prompt by the user demonstration record. Parameters: demo_record ( DemonstrationRecord ) \u2013 The user demonstration record. return: The prompt. Source code in summarizer/summarizer.py 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 def __build_prompt ( self , demo_record : DemonstrationRecord ) -> list : \"\"\" Build the prompt by the user demonstration record. :param demo_record: The user demonstration record. return: The prompt. \"\"\" demonstration_prompter = DemonstrationPrompter ( self . is_visual , self . prompt_template , self . demonstration_prompt_template , self . api_prompt_template , ) demonstration_system_prompt = ( demonstration_prompter . system_prompt_construction () ) demonstration_user_prompt = demonstration_prompter . user_content_construction ( demo_record ) demonstration_prompt = demonstration_prompter . prompt_construction ( demonstration_system_prompt , demonstration_user_prompt ) return demonstration_prompt","title":"__build_prompt"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.__parse_response","text":"Parse the response string to a dict of summary. Parameters: response_string ( str ) \u2013 The response string. return: The summary dict. Source code in summarizer/summarizer.py 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 def __parse_response ( self , response_string : str ) -> dict : \"\"\" Parse the response string to a dict of summary. :param response_string: The response string. return: The summary dict. \"\"\" try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response, in case any of the keys are missing, set them to empty string. if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary","title":"__parse_response"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.create_or_update_vector_db","text":"Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in summarizer/summarizer.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" )","title":"create_or_update_vector_db"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.create_or_update_yaml","text":"Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in summarizer/summarizer.py 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" )","title":"create_or_update_yaml"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.get_summary_list","text":"Get the summary list for a record Parameters: record ( DemonstrationRecord ) \u2013 The demonstration record. return: The summary list for the user defined completion number and the cost Source code in summarizer/summarizer.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 def get_summary_list ( self , record : DemonstrationRecord ) -> Tuple [ list , float ]: \"\"\" Get the summary list for a record :param record: The demonstration record. return: The summary list for the user defined completion number and the cost \"\"\" prompt = self . __build_prompt ( record ) response_string_list , cost = get_completions ( prompt , \"APPAGENT\" , use_backup_engine = True , n = self . completion_num ) summaries = [] for response_string in response_string_list : summary = self . __parse_response ( response_string ) if summary : summary [ \"request\" ] = record . get_request () summary [ \"app_list\" ] = record . get_applications () summaries . append ( summary ) return summaries , cost","title":"get_summary_list"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#demonstration-retriever","text":"The DemonstrationRetriever class is located in the rag/retriever.py file. The DemonstrationRetriever class provides methods to retrieve the demonstration: Bases: Retriever Class to create demonstration retrievers. Create a new DemonstrationRetriever. :db_path: The path to the database. Source code in rag/retriever.py 198 199 200 201 202 203 def __init__ ( self , db_path ) -> None : \"\"\" Create a new DemonstrationRetriever. :db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path )","title":"Demonstration Retriever"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#rag.retriever.DemonstrationRetriever.get_indexer","text":"Create a demonstration indexer. :db_path: The path to the database. Source code in rag/retriever.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 def get_indexer ( self , db_path : str ): \"\"\" Create a demonstration indexer. :db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load demonstration indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/","text":"Learning from Help Documents User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section. Mechanism The help documents are provided in a format of task-solution pairs . Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions. Note Since the retrieved help documents may not be relevant to the request, the AppAgent will only take them as references to generate the plan. Activate the Learning from Help Documents Follow the steps below to activate the learning from help documents: Step 1: Provide Help Documents Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent. Step 2: Configure the AppAgent Configure the following parameters in the config.yaml file to activate the learning from help documents: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1 Reference Bases: Retriever Class to create offline retrievers. Create a new OfflineDocRetriever. :appname: The name of the application. Source code in rag/retriever.py 78 79 80 81 82 83 84 85 def __init__ ( self , app_name : str ) -> None : \"\"\" Create a new OfflineDocRetriever. :appname: The name of the application. \"\"\" self . app_name = app_name indexer_path = self . get_offline_indexer_path () self . indexer = self . get_indexer ( indexer_path ) get_indexer ( path ) Load the retriever. Parameters: path ( str ) \u2013 The path to load the retriever from. Returns: \u2013 The loaded retriever. Source code in rag/retriever.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 def get_indexer ( self , path : str ): \"\"\" Load the retriever. :param path: The path to load the retriever from. :return: The loaded retriever. \"\"\" if path : print_with_color ( \"Loading offline indexer from {path} ...\" . format ( path = path ), \"cyan\" ) else : return None try : db = FAISS . load_local ( path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load offline indexer from {path}.\".format( # path=path # ), # \"yellow\", # ) return None get_offline_indexer_path () Get the path to the offline indexer. Returns: \u2013 The path to the offline indexer. Source code in rag/retriever.py 87 88 89 90 91 92 93 94 95 96 97 def get_offline_indexer_path ( self ): \"\"\" Get the path to the offline indexer. :return: The path to the offline indexer. \"\"\" offline_records = get_offline_learner_indexer_config () for key in offline_records : if key . lower () in self . app_name . lower (): return offline_records [ key ] return None","title":"Learning from Help Document"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#learning-from-help-documents","text":"User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section.","title":"Learning from Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#mechanism","text":"The help documents are provided in a format of task-solution pairs . Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions. Note Since the retrieved help documents may not be relevant to the request, the AppAgent will only take them as references to generate the plan.","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#activate-the-learning-from-help-documents","text":"Follow the steps below to activate the learning from help documents:","title":"Activate the Learning from Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#step-1-provide-help-documents","text":"Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent.","title":"Step 1: Provide Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#step-2-configure-the-appagent","text":"Configure the following parameters in the config.yaml file to activate the learning from help documents: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#reference","text":"Bases: Retriever Class to create offline retrievers. Create a new OfflineDocRetriever. :appname: The name of the application. Source code in rag/retriever.py 78 79 80 81 82 83 84 85 def __init__ ( self , app_name : str ) -> None : \"\"\" Create a new OfflineDocRetriever. :appname: The name of the application. \"\"\" self . app_name = app_name indexer_path = self . get_offline_indexer_path () self . indexer = self . get_indexer ( indexer_path )","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#rag.retriever.OfflineDocRetriever.get_indexer","text":"Load the retriever. Parameters: path ( str ) \u2013 The path to load the retriever from. Returns: \u2013 The loaded retriever. Source code in rag/retriever.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 def get_indexer ( self , path : str ): \"\"\" Load the retriever. :param path: The path to load the retriever from. :return: The loaded retriever. \"\"\" if path : print_with_color ( \"Loading offline indexer from {path} ...\" . format ( path = path ), \"cyan\" ) else : return None try : db = FAISS . load_local ( path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load offline indexer from {path}.\".format( # path=path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#rag.retriever.OfflineDocRetriever.get_offline_indexer_path","text":"Get the path to the offline indexer. Returns: \u2013 The path to the offline indexer. Source code in rag/retriever.py 87 88 89 90 91 92 93 94 95 96 97 def get_offline_indexer_path ( self ): \"\"\" Get the path to the offline indexer. :return: The path to the offline indexer. \"\"\" offline_records = get_offline_learner_indexer_config () for key in offline_records : if key . lower () in self . app_name . lower (): return offline_records [ key ] return None","title":"get_offline_indexer_path"},{"location":"advanced_usage/reinforce_appagent/overview/","text":"Reinforcing AppAgent UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application. We currently support the following reinforcement methods: Reinforcement Method Description Learning from Help Documents Reinforce the AppAgent by retrieving knowledge from help documents. Learning from Bing Search Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. Learning from Self-Experience Reinforce the AppAgent by learning from its own successful experiences. Learning from User Demonstrations Reinforce the AppAgent by learning from action trajectories demonstrated by users. Knowledge Provision UFO provides the knowledge to the AppAgent through a context_provision method defined in the AppAgent class: def context_provision(self, request: str = \"\") -> None: \"\"\" Provision the context for the app agent. :param request: The Bing search query. \"\"\" # Load the offline document indexer for the app agent if available. if configs[\"RAG_OFFLINE_DOCS\"]: utils.print_with_color( \"Loading offline help document indexer for {app}...\".format( app=self._process_name ), \"magenta\", ) self.build_offline_docs_retriever() # Load the online search indexer for the app agent if available. if configs[\"RAG_ONLINE_SEARCH\"] and request: utils.print_with_color(\"Creating a Bing search indexer...\", \"magenta\") self.build_online_search_retriever( request, configs[\"RAG_ONLINE_SEARCH_TOPK\"] ) # Load the experience indexer for the app agent if available. if configs[\"RAG_EXPERIENCE\"]: utils.print_with_color(\"Creating an experience indexer...\", \"magenta\") experience_path = configs[\"EXPERIENCE_SAVED_PATH\"] db_path = os.path.join(experience_path, \"experience_db\") self.build_experience_retriever(db_path) # Load the demonstration indexer for the app agent if available. if configs[\"RAG_DEMONSTRATION\"]: utils.print_with_color(\"Creating an demonstration indexer...\", \"magenta\") demonstration_path = configs[\"DEMONSTRATION_SAVED_PATH\"] db_path = os.path.join(demonstration_path, \"demonstration_db\") self.build_human_demonstration_retriever(db_path) The context_provision method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml file. Reference UFO employs the Retriever class located in the ufo/rag/retriever.py file to retrieve knowledge from various sources. The Retriever class provides the following methods to retrieve knowledge: Bases: ABC Class to retrieve documents. Create a new Retriever. Source code in rag/retriever.py 42 43 44 45 46 47 48 49 def __init__ ( self ) -> None : \"\"\" Create a new Retriever. \"\"\" self . indexer = self . get_indexer () pass get_indexer () abstractmethod Get the indexer. Returns: \u2013 The indexer. Source code in rag/retriever.py 51 52 53 54 55 56 57 @abstractmethod def get_indexer ( self ): \"\"\" Get the indexer. :return: The indexer. \"\"\" pass retrieve ( query , top_k , filter = None ) Retrieve the document from the given query. :filter: The filter to apply to the retrieved documents. Parameters: query ( str ) \u2013 The query to retrieve the document from. top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The document from the given query. Source code in rag/retriever.py 59 60 61 62 63 64 65 66 67 68 69 70 def retrieve ( self , query : str , top_k : int , filter = None ): \"\"\" Retrieve the document from the given query. :param query: The query to retrieve the document from. :param top_k: The number of documents to retrieve. :filter: The filter to apply to the retrieved documents. :return: The document from the given query. \"\"\" if not self . indexer : return None return self . indexer . similarity_search ( query , top_k , filter = filter )","title":"Overview"},{"location":"advanced_usage/reinforce_appagent/overview/#reinforcing-appagent","text":"UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application. We currently support the following reinforcement methods: Reinforcement Method Description Learning from Help Documents Reinforce the AppAgent by retrieving knowledge from help documents. Learning from Bing Search Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. Learning from Self-Experience Reinforce the AppAgent by learning from its own successful experiences. Learning from User Demonstrations Reinforce the AppAgent by learning from action trajectories demonstrated by users.","title":"Reinforcing AppAgent"},{"location":"advanced_usage/reinforce_appagent/overview/#knowledge-provision","text":"UFO provides the knowledge to the AppAgent through a context_provision method defined in the AppAgent class: def context_provision(self, request: str = \"\") -> None: \"\"\" Provision the context for the app agent. :param request: The Bing search query. \"\"\" # Load the offline document indexer for the app agent if available. if configs[\"RAG_OFFLINE_DOCS\"]: utils.print_with_color( \"Loading offline help document indexer for {app}...\".format( app=self._process_name ), \"magenta\", ) self.build_offline_docs_retriever() # Load the online search indexer for the app agent if available. if configs[\"RAG_ONLINE_SEARCH\"] and request: utils.print_with_color(\"Creating a Bing search indexer...\", \"magenta\") self.build_online_search_retriever( request, configs[\"RAG_ONLINE_SEARCH_TOPK\"] ) # Load the experience indexer for the app agent if available. if configs[\"RAG_EXPERIENCE\"]: utils.print_with_color(\"Creating an experience indexer...\", \"magenta\") experience_path = configs[\"EXPERIENCE_SAVED_PATH\"] db_path = os.path.join(experience_path, \"experience_db\") self.build_experience_retriever(db_path) # Load the demonstration indexer for the app agent if available. if configs[\"RAG_DEMONSTRATION\"]: utils.print_with_color(\"Creating an demonstration indexer...\", \"magenta\") demonstration_path = configs[\"DEMONSTRATION_SAVED_PATH\"] db_path = os.path.join(demonstration_path, \"demonstration_db\") self.build_human_demonstration_retriever(db_path) The context_provision method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml file.","title":"Knowledge Provision"},{"location":"advanced_usage/reinforce_appagent/overview/#reference","text":"UFO employs the Retriever class located in the ufo/rag/retriever.py file to retrieve knowledge from various sources. The Retriever class provides the following methods to retrieve knowledge: Bases: ABC Class to retrieve documents. Create a new Retriever. Source code in rag/retriever.py 42 43 44 45 46 47 48 49 def __init__ ( self ) -> None : \"\"\" Create a new Retriever. \"\"\" self . indexer = self . get_indexer () pass","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/overview/#rag.retriever.Retriever.get_indexer","text":"Get the indexer. Returns: \u2013 The indexer. Source code in rag/retriever.py 51 52 53 54 55 56 57 @abstractmethod def get_indexer ( self ): \"\"\" Get the indexer. :return: The indexer. \"\"\" pass","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/overview/#rag.retriever.Retriever.retrieve","text":"Retrieve the document from the given query. :filter: The filter to apply to the retrieved documents. Parameters: query ( str ) \u2013 The query to retrieve the document from. top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The document from the given query. Source code in rag/retriever.py 59 60 61 62 63 64 65 66 67 68 69 70 def retrieve ( self , query : str , top_k : int , filter = None ): \"\"\" Retrieve the document from the given query. :param query: The query to retrieve the document from. :param top_k: The number of documents to retrieve. :filter: The filter to apply to the retrieved documents. :return: The document from the given query. \"\"\" if not self . indexer : return None return self . indexer . similarity_search ( query , top_k , filter = filter )","title":"retrieve"},{"location":"agents/app_agent/","text":"AppAgent \ud83d\udc7e An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round . The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features: ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request. Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application \"expert\". Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and \"Copilot\". Tip You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation. We show the framework of the AppAgent in the following diagram: AppAgent Input To interact with the application, the AppAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Sub-Task The sub-task description to be executed by the AppAgent , assigned by the HostAgent . String Current Application The name of the application to be interacted with. String Control Information Index, name and control type of available controls in the application. List of Dictionaries Application Screenshots Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). List of Strings Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following steps. List of Strings HostAgent Message The message from the HostAgent for the completion of the sub-task. String Retrived Information The retrieved information from external knowledge bases or demonstration libraries. String Blackboard The shared memory space for storing and sharing information among the agents. Dictionary Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm. By processing these inputs, the AppAgent determines the necessary actions to fulfill the user's request within the application. Tip Whether to concatenate the clean screenshot and annotated screenshot can be configured in the CONCAT_SCREENSHOT field in the config_dev.yaml file. Tip Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the INCLUDE_LAST_SCREENSHOT field in the config_dev.yaml file. AppAgent Output With the inputs provided, the AppAgent generates the following outputs: Output Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean Below is an example of the AppAgent output: { \"Observation\": \"Application screenshot\", \"Thought\": \"Logical reasoning process\", \"ControlLabel\": \"Control index\", \"ControlText\": \"Control name\", \"Function\": \"Function name\", \"Args\": [\"arg1\", \"arg2\"], \"Status\": \"AgentState\", \"Plan\": [\"Step 1\", \"Step 2\"], \"Comment\": \"Additional comments\", \"SaveScreenshot\": true } Info The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python. AppAgent State The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include: State Description CONTINUE The AppAgent continues executing the current action. FINISH The AppAgent has completed the current sub-task. ERROR The AppAgent encountered an error during execution. FAIL The AppAgent believes the current sub-task is unachievable. CONFIRM The AppAgent is confirming the user's input or action. SCREENSHOT The AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot. The state machine diagram for the AppAgent is shown below: The AppAgent progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the HostAgent . Knowledge Enhancement The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance. Learning from Help Documents User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for how to provide help documents to the AppAgent . In the AppAgent , it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent . Learning from Bing Search Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for the implementation of Bing search in the AppAgent . In the AppAgent , it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent . Learning from Self-Demonstrations You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session , the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of self-demonstrations in the AppAgent . In the AppAgent , it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent . Learning from Human Demonstrations In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of human demonstrations in the AppAgent . In the AppAgent , it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent . Skill Set for Automation The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include: Skill Description UI Automation Mimicking user interactions with the application UI controls using the UI Automation and Win32 API. Native API Accessing the application's native API to execute specific functions and actions. In-App Agent Leveraging the in-app agent to interact with the application's internal functions and features. By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module. Reference Bases: BasicAgent The AppAgent class that manages the interaction with the application. Initialize the AppAgent. :name: The name of the agent. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. skip_prompter ( bool , default: False ) \u2013 The flag indicating whether to skip the prompter initialization. Source code in agents/agent/app_agent.py 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , skip_prompter : bool = False , ) -> None : \"\"\" Initialize the AppAgent. :name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param skip_prompter: The flag indicating whether to skip the prompter initialization. \"\"\" super () . __init__ ( name = name ) if not skip_prompter : self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) self . _process_name = process_name self . _app_root_name = app_root_name self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . Puppeteer = self . create_puppeteer_interface () self . set_state ( ContinueAppAgentState ()) status_manager : AppAgentStatus property Get the status manager. build_experience_retriever ( db_path ) Build the experience retriever. Parameters: db_path ( str ) \u2013 The path to the experience database. Returns: None \u2013 The experience retriever. Source code in agents/agent/app_agent.py 346 347 348 349 350 351 352 353 354 def build_experience_retriever ( self , db_path : str ) -> None : \"\"\" Build the experience retriever. :param db_path: The path to the experience database. :return: The experience retriever. \"\"\" self . experience_retriever = self . retriever_factory . create_retriever ( \"experience\" , db_path ) build_human_demonstration_retriever ( db_path ) Build the human demonstration retriever. Parameters: db_path ( str ) \u2013 The path to the human demonstration database. Returns: None \u2013 The human demonstration retriever. Source code in agents/agent/app_agent.py 356 357 358 359 360 361 362 363 364 def build_human_demonstration_retriever ( self , db_path : str ) -> None : \"\"\" Build the human demonstration retriever. :param db_path: The path to the human demonstration database. :return: The human demonstration retriever. \"\"\" self . human_demonstration_retriever = self . retriever_factory . create_retriever ( \"demonstration\" , db_path ) build_offline_docs_retriever () Build the offline docs retriever. Source code in agents/agent/app_agent.py 328 329 330 331 332 333 334 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" self . offline_doc_retriever = self . retriever_factory . create_retriever ( \"offline\" , self . _app_root_name ) build_online_search_retriever ( request , top_k ) Build the online search retriever. Parameters: request ( str ) \u2013 The request for online Bing search. top_k ( int ) \u2013 The number of documents to retrieve. Source code in agents/agent/app_agent.py 336 337 338 339 340 341 342 343 344 def build_online_search_retriever ( self , request : str , top_k : int ) -> None : \"\"\" Build the online search retriever. :param request: The request for online Bing search. :param top_k: The number of documents to retrieve. \"\"\" self . online_doc_retriever = self . retriever_factory . create_retriever ( \"online\" , request , top_k ) context_provision ( request = '' ) Provision the context for the app agent. Parameters: request ( str , default: '' ) \u2013 The request sent to the Bing search retriever. Source code in agents/agent/app_agent.py 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 def context_provision ( self , request : str = \"\" ) -> None : \"\"\" Provision the context for the app agent. :param request: The request sent to the Bing search retriever. \"\"\" # Load the offline document indexer for the app agent if available. if configs [ \"RAG_OFFLINE_DOCS\" ]: utils . print_with_color ( \"Loading offline help document indexer for {app} ...\" . format ( app = self . _process_name ), \"magenta\" , ) self . build_offline_docs_retriever () # Load the online search indexer for the app agent if available. if configs [ \"RAG_ONLINE_SEARCH\" ] and request : utils . print_with_color ( \"Creating a Bing search indexer...\" , \"magenta\" ) self . build_online_search_retriever ( request , configs [ \"RAG_ONLINE_SEARCH_TOPK\" ] ) # Load the experience indexer for the app agent if available. if configs [ \"RAG_EXPERIENCE\" ]: utils . print_with_color ( \"Creating an experience indexer...\" , \"magenta\" ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] db_path = os . path . join ( experience_path , \"experience_db\" ) self . build_experience_retriever ( db_path ) # Load the demonstration indexer for the app agent if available. if configs [ \"RAG_DEMONSTRATION\" ]: utils . print_with_color ( \"Creating an demonstration indexer...\" , \"magenta\" ) demonstration_path = configs [ \"DEMONSTRATION_SAVED_PATH\" ] db_path = os . path . join ( demonstration_path , \"demonstration_db\" ) self . build_human_demonstration_retriever ( db_path ) create_puppeteer_interface () Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/app_agent.py 299 300 301 302 303 304 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( self . _process_name , self . _app_root_name ) external_knowledge_prompt_helper ( request , offline_top_k , online_top_k ) Retrieve the external knowledge and construct the prompt. Parameters: request ( str ) \u2013 The request. offline_top_k ( int ) \u2013 The number of offline documents to retrieve. online_top_k ( int ) \u2013 The number of online documents to retrieve. Returns: str \u2013 The prompt message for the external_knowledge. Source code in agents/agent/app_agent.py 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 def external_knowledge_prompt_helper ( self , request : str , offline_top_k : int , online_top_k : int ) -> str : \"\"\" Retrieve the external knowledge and construct the prompt. :param request: The request. :param offline_top_k: The number of offline documents to retrieve. :param online_top_k: The number of online documents to retrieve. :return: The prompt message for the external_knowledge. \"\"\" retrieved_docs = \"\" # Retrieve offline documents and construct the prompt if self . offline_doc_retriever : offline_docs = self . offline_doc_retriever . retrieve ( \"How to {query} for {app} \" . format ( query = request , app = self . _process_name ), offline_top_k , filter = None , ) offline_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Help Documents\" , \"Document\" , [ doc . metadata [ \"text\" ] for doc in offline_docs ], ) retrieved_docs += offline_docs_prompt # Retrieve online documents and construct the prompt if self . online_doc_retriever : online_search_docs = self . online_doc_retriever . retrieve ( request , online_top_k , filter = None ) online_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Online Search Results\" , \"Search Result\" , [ doc . page_content for doc in online_search_docs ], ) retrieved_docs += online_docs_prompt return retrieved_docs get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_root_name ( str ) \u2013 The root name of the app. Returns: AppAgentPrompter \u2013 The prompter instance. Source code in agents/agent/app_agent.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_root_name : str , ) -> AppAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return AppAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) message_constructor ( dynamic_examples , dynamic_tips , dynamic_knowledge , image_list , control_info , prev_subtask , plan , request , subtask , host_message , include_last_screenshot ) Construct the prompt message for the AppAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the external knowledge base. image_list ( List ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. plan ( List [ str ] ) \u2013 The plan list. request ( str ) \u2013 The overall user request. subtask ( str ) \u2013 The subtask for the current AppAgent to process. host_message ( List [ str ] ) \u2013 The message from the HostAgent. include_last_screenshot ( bool ) \u2013 The flag indicating whether to include the last screenshot. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The prompt message. Source code in agents/agent/app_agent.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List , control_info : str , prev_subtask : List [ Dict [ str , str ]], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], include_last_screenshot : bool , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the prompt message for the AppAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base. :param image_list: The list of screenshot images. :param control_info: The control information. :param plan: The plan list. :param request: The overall user request. :param subtask: The subtask for the current AppAgent to process. :param host_message: The message from the HostAgent. :param include_last_screenshot: The flag indicating whether to include the last screenshot. :return: The prompt message. \"\"\" appagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) appagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , include_last_screenshot = include_last_screenshot , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () appagent_prompt_user_message = ( blackboard_prompt + appagent_prompt_user_message ) appagent_prompt_message = self . prompter . prompt_construction ( appagent_prompt_system_message , appagent_prompt_user_message ) return appagent_prompt_message print_response ( response_dict ) Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/app_agent.py 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" control_text = response_dict . get ( \"ControlText\" ) control_label = response_dict . get ( \"ControlLabel\" ) if not control_text and not control_label : control_text = \"[No control selected.]\" control_label = \"[No control label selected.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) plan = response_dict . get ( \"Plan\" ) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) function_call = response_dict . get ( \"Function\" ) args = utils . revise_line_breaks ( response_dict . get ( \"Args\" )) # Generate the function call string action = self . Puppeteer . get_command_string ( function_call , args ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) utils . print_with_color ( \"Selected item\ud83d\udd79\ufe0f: {control_text} , Label: {label} \" . format ( control_text = control_text , label = control_label ), \"yellow\" , ) utils . print_with_color ( \"Action applied\u2692\ufe0f: {action} \" . format ( action = action ), \"blue\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Next Plan\ud83d\udcda: {plan} \" . format ( plan = \" \\n \" . join ( plan )), \"cyan\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) screenshot_saving = response_dict . get ( \"SaveScreenshot\" , {}) if screenshot_saving . get ( \"save\" , False ): utils . print_with_color ( \"Notice: The current screenshot\ud83d\udcf8 is saved to the blackboard.\" , \"yellow\" , ) utils . print_with_color ( \"Saving reason: {reason} \" . format ( reason = screenshot_saving . get ( \"reason\" ) ), \"yellow\" , ) process ( context ) Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/app_agent.py 290 291 292 293 294 295 296 297 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = AppAgentProcessor ( agent = self , context = context ) self . processor . process () self . status = self . processor . status process_comfirmation () Process the user confirmation. Returns: bool \u2013 The decision. Source code in agents/agent/app_agent.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 def process_comfirmation ( self ) -> bool : \"\"\" Process the user confirmation. :return: The decision. \"\"\" action = self . processor . action control_text = self . processor . control_text decision = interactor . sensitive_step_asker ( action , control_text ) if not decision : utils . print_with_color ( \"The user has canceled the action.\" , \"red\" ) return decision rag_demonstration_retrieve ( request , demonstration_top_k ) Retrieving demonstration examples for the user request. Parameters: request ( str ) \u2013 The user request. demonstration_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 def rag_demonstration_retrieve ( self , request : str , demonstration_top_k : int ) -> str : \"\"\" Retrieving demonstration examples for the user request. :param request: The user request. :param demonstration_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve demonstration examples. demonstration_docs = self . human_demonstration_retriever . retrieve ( request , demonstration_top_k ) if demonstration_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in demonstration_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in demonstration_docs ] else : examples = [] tips = [] return examples , tips rag_experience_retrieve ( request , experience_top_k ) Retrieving experience examples for the user request. Parameters: request ( str ) \u2013 The user request. experience_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 def rag_experience_retrieve ( self , request : str , experience_top_k : int ) -> str : \"\"\" Retrieving experience examples for the user request. :param request: The user request. :param experience_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve experience examples. Only retrieve the examples that are related to the current application. experience_docs = self . experience_retriever . retrieve ( request , experience_top_k , filter = lambda x : self . _app_root_name . lower () in [ app . lower () for app in x [ \"app_list\" ]], ) if experience_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in experience_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in experience_docs ] else : examples = [] tips = [] return examples , tips","title":"AppAgent"},{"location":"agents/app_agent/#appagent","text":"An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round . The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features: ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request. Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application \"expert\". Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and \"Copilot\". Tip You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation. We show the framework of the AppAgent in the following diagram:","title":"AppAgent \ud83d\udc7e"},{"location":"agents/app_agent/#appagent-input","text":"To interact with the application, the AppAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Sub-Task The sub-task description to be executed by the AppAgent , assigned by the HostAgent . String Current Application The name of the application to be interacted with. String Control Information Index, name and control type of available controls in the application. List of Dictionaries Application Screenshots Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). List of Strings Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following steps. List of Strings HostAgent Message The message from the HostAgent for the completion of the sub-task. String Retrived Information The retrieved information from external knowledge bases or demonstration libraries. String Blackboard The shared memory space for storing and sharing information among the agents. Dictionary Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm.","title":"AppAgent Input"},{"location":"agents/app_agent/#appagent-output","text":"With the inputs provided, the AppAgent generates the following outputs: Output Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean Below is an example of the AppAgent output: { \"Observation\": \"Application screenshot\", \"Thought\": \"Logical reasoning process\", \"ControlLabel\": \"Control index\", \"ControlText\": \"Control name\", \"Function\": \"Function name\", \"Args\": [\"arg1\", \"arg2\"], \"Status\": \"AgentState\", \"Plan\": [\"Step 1\", \"Step 2\"], \"Comment\": \"Additional comments\", \"SaveScreenshot\": true } Info The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.","title":"AppAgent Output"},{"location":"agents/app_agent/#appagent-state","text":"The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include: State Description CONTINUE The AppAgent continues executing the current action. FINISH The AppAgent has completed the current sub-task. ERROR The AppAgent encountered an error during execution. FAIL The AppAgent believes the current sub-task is unachievable. CONFIRM The AppAgent is confirming the user's input or action. SCREENSHOT The AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot. The state machine diagram for the AppAgent is shown below:","title":"AppAgent State"},{"location":"agents/app_agent/#knowledge-enhancement","text":"The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.","title":"Knowledge Enhancement"},{"location":"agents/app_agent/#learning-from-help-documents","text":"User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for how to provide help documents to the AppAgent . In the AppAgent , it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent .","title":"Learning from Help Documents"},{"location":"agents/app_agent/#learning-from-bing-search","text":"Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for the implementation of Bing search in the AppAgent . In the AppAgent , it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent .","title":"Learning from Bing Search"},{"location":"agents/app_agent/#learning-from-self-demonstrations","text":"You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session , the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of self-demonstrations in the AppAgent . In the AppAgent , it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent .","title":"Learning from Self-Demonstrations"},{"location":"agents/app_agent/#learning-from-human-demonstrations","text":"In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of human demonstrations in the AppAgent . In the AppAgent , it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent .","title":"Learning from Human Demonstrations"},{"location":"agents/app_agent/#skill-set-for-automation","text":"The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include: Skill Description UI Automation Mimicking user interactions with the application UI controls using the UI Automation and Win32 API. Native API Accessing the application's native API to execute specific functions and actions. In-App Agent Leveraging the in-app agent to interact with the application's internal functions and features. By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module.","title":"Skill Set for Automation"},{"location":"agents/app_agent/#reference","text":"Bases: BasicAgent The AppAgent class that manages the interaction with the application. Initialize the AppAgent. :name: The name of the agent. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. skip_prompter ( bool , default: False ) \u2013 The flag indicating whether to skip the prompter initialization. Source code in agents/agent/app_agent.py 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , skip_prompter : bool = False , ) -> None : \"\"\" Initialize the AppAgent. :name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param skip_prompter: The flag indicating whether to skip the prompter initialization. \"\"\" super () . __init__ ( name = name ) if not skip_prompter : self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) self . _process_name = process_name self . _app_root_name = app_root_name self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . Puppeteer = self . create_puppeteer_interface () self . set_state ( ContinueAppAgentState ())","title":"Reference"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_experience_retriever","text":"Build the experience retriever. Parameters: db_path ( str ) \u2013 The path to the experience database. Returns: None \u2013 The experience retriever. Source code in agents/agent/app_agent.py 346 347 348 349 350 351 352 353 354 def build_experience_retriever ( self , db_path : str ) -> None : \"\"\" Build the experience retriever. :param db_path: The path to the experience database. :return: The experience retriever. \"\"\" self . experience_retriever = self . retriever_factory . create_retriever ( \"experience\" , db_path )","title":"build_experience_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_human_demonstration_retriever","text":"Build the human demonstration retriever. Parameters: db_path ( str ) \u2013 The path to the human demonstration database. Returns: None \u2013 The human demonstration retriever. Source code in agents/agent/app_agent.py 356 357 358 359 360 361 362 363 364 def build_human_demonstration_retriever ( self , db_path : str ) -> None : \"\"\" Build the human demonstration retriever. :param db_path: The path to the human demonstration database. :return: The human demonstration retriever. \"\"\" self . human_demonstration_retriever = self . retriever_factory . create_retriever ( \"demonstration\" , db_path )","title":"build_human_demonstration_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_offline_docs_retriever","text":"Build the offline docs retriever. Source code in agents/agent/app_agent.py 328 329 330 331 332 333 334 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" self . offline_doc_retriever = self . retriever_factory . create_retriever ( \"offline\" , self . _app_root_name )","title":"build_offline_docs_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_online_search_retriever","text":"Build the online search retriever. Parameters: request ( str ) \u2013 The request for online Bing search. top_k ( int ) \u2013 The number of documents to retrieve. Source code in agents/agent/app_agent.py 336 337 338 339 340 341 342 343 344 def build_online_search_retriever ( self , request : str , top_k : int ) -> None : \"\"\" Build the online search retriever. :param request: The request for online Bing search. :param top_k: The number of documents to retrieve. \"\"\" self . online_doc_retriever = self . retriever_factory . create_retriever ( \"online\" , request , top_k )","title":"build_online_search_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.context_provision","text":"Provision the context for the app agent. Parameters: request ( str , default: '' ) \u2013 The request sent to the Bing search retriever. Source code in agents/agent/app_agent.py 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 def context_provision ( self , request : str = \"\" ) -> None : \"\"\" Provision the context for the app agent. :param request: The request sent to the Bing search retriever. \"\"\" # Load the offline document indexer for the app agent if available. if configs [ \"RAG_OFFLINE_DOCS\" ]: utils . print_with_color ( \"Loading offline help document indexer for {app} ...\" . format ( app = self . _process_name ), \"magenta\" , ) self . build_offline_docs_retriever () # Load the online search indexer for the app agent if available. if configs [ \"RAG_ONLINE_SEARCH\" ] and request : utils . print_with_color ( \"Creating a Bing search indexer...\" , \"magenta\" ) self . build_online_search_retriever ( request , configs [ \"RAG_ONLINE_SEARCH_TOPK\" ] ) # Load the experience indexer for the app agent if available. if configs [ \"RAG_EXPERIENCE\" ]: utils . print_with_color ( \"Creating an experience indexer...\" , \"magenta\" ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] db_path = os . path . join ( experience_path , \"experience_db\" ) self . build_experience_retriever ( db_path ) # Load the demonstration indexer for the app agent if available. if configs [ \"RAG_DEMONSTRATION\" ]: utils . print_with_color ( \"Creating an demonstration indexer...\" , \"magenta\" ) demonstration_path = configs [ \"DEMONSTRATION_SAVED_PATH\" ] db_path = os . path . join ( demonstration_path , \"demonstration_db\" ) self . build_human_demonstration_retriever ( db_path )","title":"context_provision"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.create_puppeteer_interface","text":"Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/app_agent.py 299 300 301 302 303 304 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( self . _process_name , self . _app_root_name )","title":"create_puppeteer_interface"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.external_knowledge_prompt_helper","text":"Retrieve the external knowledge and construct the prompt. Parameters: request ( str ) \u2013 The request. offline_top_k ( int ) \u2013 The number of offline documents to retrieve. online_top_k ( int ) \u2013 The number of online documents to retrieve. Returns: str \u2013 The prompt message for the external_knowledge. Source code in agents/agent/app_agent.py 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 def external_knowledge_prompt_helper ( self , request : str , offline_top_k : int , online_top_k : int ) -> str : \"\"\" Retrieve the external knowledge and construct the prompt. :param request: The request. :param offline_top_k: The number of offline documents to retrieve. :param online_top_k: The number of online documents to retrieve. :return: The prompt message for the external_knowledge. \"\"\" retrieved_docs = \"\" # Retrieve offline documents and construct the prompt if self . offline_doc_retriever : offline_docs = self . offline_doc_retriever . retrieve ( \"How to {query} for {app} \" . format ( query = request , app = self . _process_name ), offline_top_k , filter = None , ) offline_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Help Documents\" , \"Document\" , [ doc . metadata [ \"text\" ] for doc in offline_docs ], ) retrieved_docs += offline_docs_prompt # Retrieve online documents and construct the prompt if self . online_doc_retriever : online_search_docs = self . online_doc_retriever . retrieve ( request , online_top_k , filter = None ) online_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Online Search Results\" , \"Search Result\" , [ doc . page_content for doc in online_search_docs ], ) retrieved_docs += online_docs_prompt return retrieved_docs","title":"external_knowledge_prompt_helper"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.get_prompter","text":"Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_root_name ( str ) \u2013 The root name of the app. Returns: AppAgentPrompter \u2013 The prompter instance. Source code in agents/agent/app_agent.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_root_name : str , ) -> AppAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return AppAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name )","title":"get_prompter"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.message_constructor","text":"Construct the prompt message for the AppAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the external knowledge base. image_list ( List ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. plan ( List [ str ] ) \u2013 The plan list. request ( str ) \u2013 The overall user request. subtask ( str ) \u2013 The subtask for the current AppAgent to process. host_message ( List [ str ] ) \u2013 The message from the HostAgent. include_last_screenshot ( bool ) \u2013 The flag indicating whether to include the last screenshot. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The prompt message. Source code in agents/agent/app_agent.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List , control_info : str , prev_subtask : List [ Dict [ str , str ]], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], include_last_screenshot : bool , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the prompt message for the AppAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base. :param image_list: The list of screenshot images. :param control_info: The control information. :param plan: The plan list. :param request: The overall user request. :param subtask: The subtask for the current AppAgent to process. :param host_message: The message from the HostAgent. :param include_last_screenshot: The flag indicating whether to include the last screenshot. :return: The prompt message. \"\"\" appagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) appagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , include_last_screenshot = include_last_screenshot , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () appagent_prompt_user_message = ( blackboard_prompt + appagent_prompt_user_message ) appagent_prompt_message = self . prompter . prompt_construction ( appagent_prompt_system_message , appagent_prompt_user_message ) return appagent_prompt_message","title":"message_constructor"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.print_response","text":"Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/app_agent.py 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" control_text = response_dict . get ( \"ControlText\" ) control_label = response_dict . get ( \"ControlLabel\" ) if not control_text and not control_label : control_text = \"[No control selected.]\" control_label = \"[No control label selected.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) plan = response_dict . get ( \"Plan\" ) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) function_call = response_dict . get ( \"Function\" ) args = utils . revise_line_breaks ( response_dict . get ( \"Args\" )) # Generate the function call string action = self . Puppeteer . get_command_string ( function_call , args ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) utils . print_with_color ( \"Selected item\ud83d\udd79\ufe0f: {control_text} , Label: {label} \" . format ( control_text = control_text , label = control_label ), \"yellow\" , ) utils . print_with_color ( \"Action applied\u2692\ufe0f: {action} \" . format ( action = action ), \"blue\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Next Plan\ud83d\udcda: {plan} \" . format ( plan = \" \\n \" . join ( plan )), \"cyan\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) screenshot_saving = response_dict . get ( \"SaveScreenshot\" , {}) if screenshot_saving . get ( \"save\" , False ): utils . print_with_color ( \"Notice: The current screenshot\ud83d\udcf8 is saved to the blackboard.\" , \"yellow\" , ) utils . print_with_color ( \"Saving reason: {reason} \" . format ( reason = screenshot_saving . get ( \"reason\" ) ), \"yellow\" , )","title":"print_response"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.process","text":"Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/app_agent.py 290 291 292 293 294 295 296 297 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = AppAgentProcessor ( agent = self , context = context ) self . processor . process () self . status = self . processor . status","title":"process"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.process_comfirmation","text":"Process the user confirmation. Returns: bool \u2013 The decision. Source code in agents/agent/app_agent.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 def process_comfirmation ( self ) -> bool : \"\"\" Process the user confirmation. :return: The decision. \"\"\" action = self . processor . action control_text = self . processor . control_text decision = interactor . sensitive_step_asker ( action , control_text ) if not decision : utils . print_with_color ( \"The user has canceled the action.\" , \"red\" ) return decision","title":"process_comfirmation"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.rag_demonstration_retrieve","text":"Retrieving demonstration examples for the user request. Parameters: request ( str ) \u2013 The user request. demonstration_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 def rag_demonstration_retrieve ( self , request : str , demonstration_top_k : int ) -> str : \"\"\" Retrieving demonstration examples for the user request. :param request: The user request. :param demonstration_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve demonstration examples. demonstration_docs = self . human_demonstration_retriever . retrieve ( request , demonstration_top_k ) if demonstration_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in demonstration_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in demonstration_docs ] else : examples = [] tips = [] return examples , tips","title":"rag_demonstration_retrieve"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.rag_experience_retrieve","text":"Retrieving experience examples for the user request. Parameters: request ( str ) \u2013 The user request. experience_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 def rag_experience_retrieve ( self , request : str , experience_top_k : int ) -> str : \"\"\" Retrieving experience examples for the user request. :param request: The user request. :param experience_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve experience examples. Only retrieve the examples that are related to the current application. experience_docs = self . experience_retriever . retrieve ( request , experience_top_k , filter = lambda x : self . _app_root_name . lower () in [ app . lower () for app in x [ \"app_list\" ]], ) if experience_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in experience_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in experience_docs ] else : examples = [] tips = [] return examples , tips","title":"rag_experience_retrieve"},{"location":"agents/evaluation_agent/","text":"EvaluationAgent \ud83e\uddd0 The objective of the EvaluationAgent is to evaluate whether a Session or Round has been successfully completed. The EvaluationAgent assesses the performance of the HostAgent and AppAgent in fulfilling the request. You can configure whether to enable the EvaluationAgent in the config_dev.yaml file and the detailed documentation can be found here . Note The EvaluationAgent is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes. Configuration To enable the EvaluationAgent , you can configure the following parameters in the config_dev.yaml file to evaluate the task completion status at different levels: Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True Evaluation Inputs The EvaluationAgent takes the following inputs for evaluation: Input Description Type User Request The user's request to be evaluated. String APIs Description The description of the APIs used in the execution. List of Strings Action Trajectories The action trajectories executed by the HostAgent and AppAgent . List of Strings Screenshots The screenshots captured during the execution. List of Images For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter class in ufo/prompter/eva_prompter.py . Tip You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS of the config_dev.yaml file. Evaluation Outputs The EvaluationAgent generates the following outputs after evaluation: Output Description Type reason The detailed reason for your judgment, by observing the screenshot differences and the . String sub_scores The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries complete The completion status of the evaluation, can be yes , no , or unsure . String Below is an example of the evaluation output: { \"reason\": \"The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams. The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open. The agent then focused on the chat window, input the message 'hello', and clicked the Send button. The final screenshot confirms that the message 'hello' was sent to Zac.\", \"sub_scores\": { \"correct application focus\": \"yes\", \"correct message input\": \"yes\", \"message sent successfully\": \"yes\" }, \"complete\": \"yes\"} Info The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log file. The EvaluationAgent employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation. Reference Bases: BasicAgent The agent for evaluation. Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. Source code in agents/agent/evaluation_agent.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. \"\"\" super () . __init__ ( name = name ) self . _app_root_name = app_root_name self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name , ) status_manager : EvaluatonAgentStatus property Get the status manager. evaluate ( request , log_path , eva_all_screenshots = True ) Evaluate the task completion. Parameters: log_path ( str ) \u2013 The path to the log file. Returns: Tuple [ Dict [ str , str ], float ] \u2013 The evaluation result and the cost of LLM. Source code in agents/agent/evaluation_agent.py 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 def evaluate ( self , request : str , log_path : str , eva_all_screenshots : bool = True ) -> Tuple [ Dict [ str , str ], float ]: \"\"\" Evaluate the task completion. :param log_path: The path to the log file. :return: The evaluation result and the cost of LLM. \"\"\" message = self . message_constructor ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) result , cost = self . get_response ( message = message , namescope = \"app\" , use_backup_engine = True ) result = json_parser ( result ) return result , cost get_prompter ( is_visual , prompt_template , example_prompt_template , api_prompt_template , root_name = None ) Get the prompter for the agent. Source code in agents/agent/evaluation_agent.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def get_prompter ( self , is_visual , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> EvaluationAgentPrompter : \"\"\" Get the prompter for the agent. \"\"\" return EvaluationAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , ) message_constructor ( log_path , request , eva_all_screenshots = True ) Construct the message. Parameters: log_path ( str ) \u2013 The path to the log file. request ( str ) \u2013 The request. eva_all_screenshots ( bool , default: True ) \u2013 The flag indicating whether to evaluate all screenshots. Returns: Dict [ str , Any ] \u2013 The message. Source code in agents/agent/evaluation_agent.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 def message_constructor ( self , log_path : str , request : str , eva_all_screenshots : bool = True ) -> Dict [ str , Any ]: \"\"\" Construct the message. :param log_path: The path to the log file. :param request: The request. :param eva_all_screenshots: The flag indicating whether to evaluate all screenshots. :return: The message. \"\"\" evaagent_prompt_system_message = self . prompter . system_prompt_construction () evaagent_prompt_user_message = self . prompter . user_content_construction ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) evaagent_prompt_message = self . prompter . prompt_construction ( evaagent_prompt_system_message , evaagent_prompt_user_message ) return evaagent_prompt_message print_response ( response_dict ) Print the response of the evaluation. Parameters: response_dict ( Dict [ str , Any ] ) \u2013 The response dictionary. Source code in agents/agent/evaluation_agent.py 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 def print_response ( self , response_dict : Dict [ str , Any ]) -> None : \"\"\" Print the response of the evaluation. :param response_dict: The response dictionary. \"\"\" emoji_map = { \"yes\" : \"\u2705\" , \"no\" : \"\u274c\" , \"maybe\" : \"\u2753\" , } complete = emoji_map . get ( response_dict . get ( \"complete\" ), response_dict . get ( \"complete\" ) ) sub_scores = response_dict . get ( \"sub_scores\" , {}) reason = response_dict . get ( \"reason\" , \"\" ) print_with_color ( f \"Evaluation result\ud83e\uddd0:\" , \"magenta\" ) print_with_color ( f \"[Sub-scores\ud83d\udcca:]\" , \"green\" ) for score , evaluation in sub_scores . items (): print_with_color ( f \" { score } : { emoji_map . get ( evaluation , evaluation ) } \" , \"green\" ) print_with_color ( \"[Task is complete\ud83d\udcaf:] {complete} \" . format ( complete = complete ), \"cyan\" ) print_with_color ( f \"[Reason\ud83e\udd14:] { reason } \" . format ( reason = reason ), \"blue\" ) process_comfirmation () Comfirmation, currently do nothing. Source code in agents/agent/evaluation_agent.py 124 125 126 127 128 def process_comfirmation ( self ) -> None : \"\"\" Comfirmation, currently do nothing. \"\"\" pass","title":"EvaluationAgent"},{"location":"agents/evaluation_agent/#evaluationagent","text":"The objective of the EvaluationAgent is to evaluate whether a Session or Round has been successfully completed. The EvaluationAgent assesses the performance of the HostAgent and AppAgent in fulfilling the request. You can configure whether to enable the EvaluationAgent in the config_dev.yaml file and the detailed documentation can be found here . Note The EvaluationAgent is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes.","title":"EvaluationAgent \ud83e\uddd0"},{"location":"agents/evaluation_agent/#configuration","text":"To enable the EvaluationAgent , you can configure the following parameters in the config_dev.yaml file to evaluate the task completion status at different levels: Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True","title":"Configuration"},{"location":"agents/evaluation_agent/#evaluation-inputs","text":"The EvaluationAgent takes the following inputs for evaluation: Input Description Type User Request The user's request to be evaluated. String APIs Description The description of the APIs used in the execution. List of Strings Action Trajectories The action trajectories executed by the HostAgent and AppAgent . List of Strings Screenshots The screenshots captured during the execution. List of Images For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter class in ufo/prompter/eva_prompter.py . Tip You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS of the config_dev.yaml file.","title":"Evaluation Inputs"},{"location":"agents/evaluation_agent/#evaluation-outputs","text":"The EvaluationAgent generates the following outputs after evaluation: Output Description Type reason The detailed reason for your judgment, by observing the screenshot differences and the . String sub_scores The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries complete The completion status of the evaluation, can be yes , no , or unsure . String Below is an example of the evaluation output: { \"reason\": \"The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams. The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open. The agent then focused on the chat window, input the message 'hello', and clicked the Send button. The final screenshot confirms that the message 'hello' was sent to Zac.\", \"sub_scores\": { \"correct application focus\": \"yes\", \"correct message input\": \"yes\", \"message sent successfully\": \"yes\" }, \"complete\": \"yes\"} Info The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log file. The EvaluationAgent employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation.","title":"Evaluation Outputs"},{"location":"agents/evaluation_agent/#reference","text":"Bases: BasicAgent The agent for evaluation. Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. Source code in agents/agent/evaluation_agent.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. \"\"\" super () . __init__ ( name = name ) self . _app_root_name = app_root_name self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name , )","title":"Reference"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.evaluate","text":"Evaluate the task completion. Parameters: log_path ( str ) \u2013 The path to the log file. Returns: Tuple [ Dict [ str , str ], float ] \u2013 The evaluation result and the cost of LLM. Source code in agents/agent/evaluation_agent.py 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 def evaluate ( self , request : str , log_path : str , eva_all_screenshots : bool = True ) -> Tuple [ Dict [ str , str ], float ]: \"\"\" Evaluate the task completion. :param log_path: The path to the log file. :return: The evaluation result and the cost of LLM. \"\"\" message = self . message_constructor ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) result , cost = self . get_response ( message = message , namescope = \"app\" , use_backup_engine = True ) result = json_parser ( result ) return result , cost","title":"evaluate"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.get_prompter","text":"Get the prompter for the agent. Source code in agents/agent/evaluation_agent.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def get_prompter ( self , is_visual , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> EvaluationAgentPrompter : \"\"\" Get the prompter for the agent. \"\"\" return EvaluationAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , )","title":"get_prompter"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.message_constructor","text":"Construct the message. Parameters: log_path ( str ) \u2013 The path to the log file. request ( str ) \u2013 The request. eva_all_screenshots ( bool , default: True ) \u2013 The flag indicating whether to evaluate all screenshots. Returns: Dict [ str , Any ] \u2013 The message. Source code in agents/agent/evaluation_agent.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 def message_constructor ( self , log_path : str , request : str , eva_all_screenshots : bool = True ) -> Dict [ str , Any ]: \"\"\" Construct the message. :param log_path: The path to the log file. :param request: The request. :param eva_all_screenshots: The flag indicating whether to evaluate all screenshots. :return: The message. \"\"\" evaagent_prompt_system_message = self . prompter . system_prompt_construction () evaagent_prompt_user_message = self . prompter . user_content_construction ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) evaagent_prompt_message = self . prompter . prompt_construction ( evaagent_prompt_system_message , evaagent_prompt_user_message ) return evaagent_prompt_message","title":"message_constructor"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.print_response","text":"Print the response of the evaluation. Parameters: response_dict ( Dict [ str , Any ] ) \u2013 The response dictionary. Source code in agents/agent/evaluation_agent.py 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 def print_response ( self , response_dict : Dict [ str , Any ]) -> None : \"\"\" Print the response of the evaluation. :param response_dict: The response dictionary. \"\"\" emoji_map = { \"yes\" : \"\u2705\" , \"no\" : \"\u274c\" , \"maybe\" : \"\u2753\" , } complete = emoji_map . get ( response_dict . get ( \"complete\" ), response_dict . get ( \"complete\" ) ) sub_scores = response_dict . get ( \"sub_scores\" , {}) reason = response_dict . get ( \"reason\" , \"\" ) print_with_color ( f \"Evaluation result\ud83e\uddd0:\" , \"magenta\" ) print_with_color ( f \"[Sub-scores\ud83d\udcca:]\" , \"green\" ) for score , evaluation in sub_scores . items (): print_with_color ( f \" { score } : { emoji_map . get ( evaluation , evaluation ) } \" , \"green\" ) print_with_color ( \"[Task is complete\ud83d\udcaf:] {complete} \" . format ( complete = complete ), \"cyan\" ) print_with_color ( f \"[Reason\ud83e\udd14:] { reason } \" . format ( reason = reason ), \"blue\" )","title":"print_response"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.process_comfirmation","text":"Comfirmation, currently do nothing. Source code in agents/agent/evaluation_agent.py 124 125 126 127 128 def process_comfirmation ( self ) -> None : \"\"\" Comfirmation, currently do nothing. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/follower_agent/","text":"Follower Agent \ud83d\udeb6\ud83c\udffd\u200d\u2642\ufe0f The FollowerAgent is inherited from the AppAgent and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior. Different from the AppAgent The FollowerAgent shares most of the functionalities with the AppAgent , but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action. Usage The FollowerAgent is available in follower mode. You can find more details in the documentation . It also uses differnt Session and Processor to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent to execute the actions. An example of the json file is shown below: { \"task\": \"Type in a bold text of 'Test For Fun'\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.select the text of 'Test For Fun'\", \"3.click on the bold\" ], \"object\": \"draft.docx\" } Reference Bases: AppAgent The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. It is a subclass of the AppAgent, which completes the action execution within the application. Initialize the FollowAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. Source code in agents/agent/follower_agent.py 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , ): \"\"\" Initialize the FollowAgent. :param name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. \"\"\" super () . __init__ ( name = name , process_name = process_name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , skip_prompter = True , ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , ) get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name = '' ) Get the prompter for the follower agent. Parameters: is_visual ( str ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. app_root_name ( str , default: '' ) \u2013 The root name of the app. Returns: FollowerAgentPrompter \u2013 The prompter instance. Source code in agents/agent/follower_agent.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 def get_prompter ( self , is_visual : str , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , app_root_name : str = \"\" , ) -> FollowerAgentPrompter : \"\"\" Get the prompter for the follower agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return FollowerAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , ) message_constructor ( dynamic_examples , dynamic_tips , dynamic_knowledge , image_list , control_info , prev_subtask , plan , request , subtask , host_message , current_state , state_diff , include_last_screenshot ) Construct the prompt message for the FollowAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the self-demonstration and human demonstration. image_list ( List [ str ] ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. prev_subtask ( List [ str ] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. subtask ( str ) \u2013 The subtask. host_message ( List [ str ] ) \u2013 The host message. current_state ( Dict [ str , str ] ) \u2013 The current state of the app. state_diff ( Dict [ str , str ] ) \u2013 The state difference between the current state and the previous state. include_last_screenshot ( bool ) \u2013 The flag indicating whether the last screenshot should be included. Returns: List [ Dict [ str , str ]] \u2013 The prompt message. Source code in agents/agent/follower_agent.py 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List [ str ], control_info : str , prev_subtask : List [ str ], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], current_state : Dict [ str , str ], state_diff : Dict [ str , str ], include_last_screenshot : bool , ) -> List [ Dict [ str , str ]]: \"\"\" Construct the prompt message for the FollowAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the self-demonstration and human demonstration. :param image_list: The list of screenshot images. :param control_info: The control information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :param subtask: The subtask. :param host_message: The host message. :param current_state: The current state of the app. :param state_diff: The state difference between the current state and the previous state. :param include_last_screenshot: The flag indicating whether the last screenshot should be included. :return: The prompt message. \"\"\" followagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) followagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , current_state = current_state , state_diff = state_diff , include_last_screenshot = include_last_screenshot , ) followagent_prompt_message = self . prompter . prompt_construction ( followagent_prompt_system_message , followagent_prompt_user_message ) return followagent_prompt_message","title":"FollowerAgent"},{"location":"agents/follower_agent/#follower-agent","text":"The FollowerAgent is inherited from the AppAgent and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior.","title":"Follower Agent \ud83d\udeb6\ud83c\udffd\u200d\u2642\ufe0f"},{"location":"agents/follower_agent/#different-from-the-appagent","text":"The FollowerAgent shares most of the functionalities with the AppAgent , but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action.","title":"Different from the AppAgent"},{"location":"agents/follower_agent/#usage","text":"The FollowerAgent is available in follower mode. You can find more details in the documentation . It also uses differnt Session and Processor to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent to execute the actions. An example of the json file is shown below: { \"task\": \"Type in a bold text of 'Test For Fun'\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.select the text of 'Test For Fun'\", \"3.click on the bold\" ], \"object\": \"draft.docx\" }","title":"Usage"},{"location":"agents/follower_agent/#reference","text":"Bases: AppAgent The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. It is a subclass of the AppAgent, which completes the action execution within the application. Initialize the FollowAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. Source code in agents/agent/follower_agent.py 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , ): \"\"\" Initialize the FollowAgent. :param name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. \"\"\" super () . __init__ ( name = name , process_name = process_name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , skip_prompter = True , ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , )","title":"Reference"},{"location":"agents/follower_agent/#agents.agent.follower_agent.FollowerAgent.get_prompter","text":"Get the prompter for the follower agent. Parameters: is_visual ( str ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. app_root_name ( str , default: '' ) \u2013 The root name of the app. Returns: FollowerAgentPrompter \u2013 The prompter instance. Source code in agents/agent/follower_agent.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 def get_prompter ( self , is_visual : str , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , app_root_name : str = \"\" , ) -> FollowerAgentPrompter : \"\"\" Get the prompter for the follower agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return FollowerAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , )","title":"get_prompter"},{"location":"agents/follower_agent/#agents.agent.follower_agent.FollowerAgent.message_constructor","text":"Construct the prompt message for the FollowAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the self-demonstration and human demonstration. image_list ( List [ str ] ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. prev_subtask ( List [ str ] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. subtask ( str ) \u2013 The subtask. host_message ( List [ str ] ) \u2013 The host message. current_state ( Dict [ str , str ] ) \u2013 The current state of the app. state_diff ( Dict [ str , str ] ) \u2013 The state difference between the current state and the previous state. include_last_screenshot ( bool ) \u2013 The flag indicating whether the last screenshot should be included. Returns: List [ Dict [ str , str ]] \u2013 The prompt message. Source code in agents/agent/follower_agent.py 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List [ str ], control_info : str , prev_subtask : List [ str ], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], current_state : Dict [ str , str ], state_diff : Dict [ str , str ], include_last_screenshot : bool , ) -> List [ Dict [ str , str ]]: \"\"\" Construct the prompt message for the FollowAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the self-demonstration and human demonstration. :param image_list: The list of screenshot images. :param control_info: The control information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :param subtask: The subtask. :param host_message: The host message. :param current_state: The current state of the app. :param state_diff: The state difference between the current state and the previous state. :param include_last_screenshot: The flag indicating whether the last screenshot should be included. :return: The prompt message. \"\"\" followagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) followagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , current_state = current_state , state_diff = state_diff , include_last_screenshot = include_last_screenshot , ) followagent_prompt_message = self . prompter . prompt_construction ( followagent_prompt_system_message , followagent_prompt_user_message ) return followagent_prompt_message","title":"message_constructor"},{"location":"agents/host_agent/","text":"HostAgent \ud83e\udd16 The HostAgent assumes three primary responsibilities: User Engagement : The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary. AppAgent Management : The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application. Task Management : The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents . It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request. Bash Command Execution : The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents ' execution. Communication : The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below: The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request. HostAgent Input The HostAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Application Information Information about the existing active applications. List of Strings Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent . Image Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following sub-tasks. List of Strings Blackboard The shared memory space for storing and sharing information among the agents. Dictionary By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions. HostAgent Output With the inputs provided, the HostAgent generates the following outputs: Output Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String Below is an example of the HostAgent output: { \"Observation\": \"Desktop screenshot\", \"Thought\": \"Logical reasoning process\", \"Current Sub-Task\": \"Sub-task description\", \"Message\": \"Message to AppAgent\", \"ControlLabel\": \"Application index\", \"ControlText\": \"Application name\", \"Plan\": [\"Sub-task 1\", \"Sub-task 2\"], \"Status\": \"AgentState\", \"Comment\": \"Additional comments\", \"Questions\": [\"Question 1\", \"Question 2\"], \"Bash\": \"Bash command\" } Info The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python. HostAgent State The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include: State Description CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks. ASSIGN The HostAgent is assigning the sub-tasks to the AppAgents for execution. FINISH The overall task is completed, and the HostAgent is ready to return the results to the user. ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed. FAIL The HostAgent believes the task is unachievable and cannot proceed further. PENDING The HostAgent is waiting for additional information from the user to proceed. The state machine diagram for the HostAgent is shown below: The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks. Task Decomposition Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure: Creating and Registering AppAgents When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent , by calling the create_subagent method: def create_subagent( self, agent_type: str, agent_name: str, process_name: str, app_root_name: str, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str, *args, **kwargs, ) -> BasicAgent: \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self.agent_factory.create_agent( agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs, ) self.appagent_dict[agent_name] = app_agent app_agent.host = self self._active_appagent = app_agent return app_agent The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress. Reference Bases: BasicAgent The HostAgent class the manager of AppAgents. Initialize the HostAgent. :name: The name of the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Source code in agents/agent/host_agent.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 def __init__ ( self , name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> None : \"\"\" Initialize the HostAgent. :name: The name of the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. \"\"\" super () . __init__ ( name = name ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . agent_factory = AgentFactory () self . appagent_dict = {} self . _active_appagent = None self . _blackboard = Blackboard () self . set_state ( ContinueHostAgentState ()) self . Puppeteer = self . create_puppeteer_interface () blackboard property Get the blackboard. status_manager : HostAgentStatus property Get the status manager. sub_agent_amount : int property Get the amount of sub agents. Returns: int \u2013 The amount of sub agents. create_app_agent ( application_window_name , application_root_name , request , mode ) Create the app agent for the host agent. Parameters: application_window_name ( str ) \u2013 The name of the application window. application_root_name ( str ) \u2013 The name of the application root. request ( str ) \u2013 The user request. mode ( str ) \u2013 The mode of the session. Returns: AppAgent \u2013 The app agent. Source code in agents/agent/host_agent.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 def create_app_agent ( self , application_window_name : str , application_root_name : str , request : str , mode : str , ) -> AppAgent : \"\"\" Create the app agent for the host agent. :param application_window_name: The name of the application window. :param application_root_name: The name of the application root. :param request: The user request. :param mode: The mode of the session. :return: The app agent. \"\"\" if mode == \"normal\" : agent_name = \"AppAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) app_agent : AppAgent = self . create_subagent ( agent_type = \"app\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"APPAGENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], ) elif mode == \"follower\" : # Load additional app info prompt. app_info_prompt = configs . get ( \"APP_INFO_PROMPT\" , None ) agent_name = \"FollowerAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) # Create the app agent in the follower mode. app_agent = self . create_subagent ( agent_type = \"follower\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"FOLLOWERAHENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], app_info_prompt = app_info_prompt , ) else : raise ValueError ( f \"The { mode } mode is not supported.\" ) # Create the COM receiver for the app agent. if configs . get ( \"USE_APIS\" , False ): app_agent . Puppeteer . receiver_manager . create_api_receiver ( application_root_name , application_window_name ) # Provision the context for the app agent, including the all retrievers. app_agent . context_provision ( request ) return app_agent create_puppeteer_interface () Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/host_agent.py 213 214 215 216 217 218 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( \"\" , \"\" ) create_subagent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs ) Create an SubAgent hosted by the HostAgent. Parameters: agent_type ( str ) \u2013 The type of the agent to create. agent_name ( str ) \u2013 The name of the SubAgent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: BasicAgent \u2013 The created SubAgent. Source code in agents/agent/host_agent.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def create_subagent ( self , agent_type : str , agent_name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , * args , ** kwargs , ) -> BasicAgent : \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self . agent_factory . create_agent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs , ) self . appagent_dict [ agent_name ] = app_agent app_agent . host = self self . _active_appagent = app_agent return app_agent get_active_appagent () Get the active app agent. Returns: AppAgent \u2013 The active app agent. Source code in agents/agent/host_agent.py 150 151 152 153 154 155 def get_active_appagent ( self ) -> AppAgent : \"\"\" Get the active app agent. :return: The active app agent. \"\"\" return self . _active_appagent get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: HostAgentPrompter \u2013 The prompter instance. Source code in agents/agent/host_agent.py 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> HostAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The prompter instance. \"\"\" return HostAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt ) message_constructor ( image_list , os_info , plan , prev_subtask , request ) Construct the message. Parameters: image_list ( List [ str ] ) \u2013 The list of screenshot images. os_info ( str ) \u2013 The OS information. prev_subtask ( List [ Dict [ str , str ]] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/host_agent.py 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 def message_constructor ( self , image_list : List [ str ], os_info : str , plan : List [ str ], prev_subtask : List [ Dict [ str , str ]], request : str , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :param image_list: The list of screenshot images. :param os_info: The OS information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :return: The message. \"\"\" hostagent_prompt_system_message = self . prompter . system_prompt_construction () hostagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = os_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () hostagent_prompt_user_message = ( blackboard_prompt + hostagent_prompt_user_message ) hostagent_prompt_message = self . prompter . prompt_construction ( hostagent_prompt_system_message , hostagent_prompt_user_message ) return hostagent_prompt_message print_response ( response_dict ) Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/host_agent.py 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" application = response_dict . get ( \"ControlText\" ) if not application : application = \"[The required application needs to be opened.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) bash_command = response_dict . get ( \"Bash\" , None ) subtask = response_dict . get ( \"CurrentSubtask\" ) # Convert the message from a list to a string. message = list ( response_dict . get ( \"Message\" , \"\" )) message = \" \\n \" . join ( message ) # Concatenate the subtask with the plan and convert the plan from a list to a string. plan = list ( response_dict . get ( \"Plan\" )) plan = [ subtask ] + plan plan = \" \\n \" . join ([ f \"( { i + 1 } ) \" + str ( item ) for i , item in enumerate ( plan )]) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) if bash_command : utils . print_with_color ( \"Running Bash Command\ud83d\udd27: {bash} \" . format ( bash = bash_command ), \"yellow\" ) utils . print_with_color ( \"Plans\ud83d\udcda: {plan} \" . format ( plan = plan ), \"cyan\" , ) utils . print_with_color ( \"Next Selected application\ud83d\udcf2: {application} \" . format ( application = application ), \"yellow\" , ) utils . print_with_color ( \"Messages to AppAgent\ud83d\udce9: {message} \" . format ( message = message ), \"cyan\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) process ( context ) Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/host_agent.py 202 203 204 205 206 207 208 209 210 211 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = HostAgentProcessor ( agent = self , context = context ) self . processor . process () # Sync the status with the processor. self . status = self . processor . status process_comfirmation () TODO: Process the confirmation. Source code in agents/agent/host_agent.py 289 290 291 292 293 def process_comfirmation ( self ) -> None : \"\"\" TODO: Process the confirmation. \"\"\" pass","title":"HostAgent"},{"location":"agents/host_agent/#hostagent","text":"The HostAgent assumes three primary responsibilities: User Engagement : The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary. AppAgent Management : The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application. Task Management : The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents . It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request. Bash Command Execution : The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents ' execution. Communication : The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below:","title":"HostAgent \ud83e\udd16"},{"location":"agents/host_agent/#hostagent-input","text":"The HostAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Application Information Information about the existing active applications. List of Strings Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent . Image Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following sub-tasks. List of Strings Blackboard The shared memory space for storing and sharing information among the agents. Dictionary By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.","title":"HostAgent Input"},{"location":"agents/host_agent/#hostagent-output","text":"With the inputs provided, the HostAgent generates the following outputs: Output Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String Below is an example of the HostAgent output: { \"Observation\": \"Desktop screenshot\", \"Thought\": \"Logical reasoning process\", \"Current Sub-Task\": \"Sub-task description\", \"Message\": \"Message to AppAgent\", \"ControlLabel\": \"Application index\", \"ControlText\": \"Application name\", \"Plan\": [\"Sub-task 1\", \"Sub-task 2\"], \"Status\": \"AgentState\", \"Comment\": \"Additional comments\", \"Questions\": [\"Question 1\", \"Question 2\"], \"Bash\": \"Bash command\" } Info The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.","title":"HostAgent Output"},{"location":"agents/host_agent/#hostagent-state","text":"The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include: State Description CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks. ASSIGN The HostAgent is assigning the sub-tasks to the AppAgents for execution. FINISH The overall task is completed, and the HostAgent is ready to return the results to the user. ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed. FAIL The HostAgent believes the task is unachievable and cannot proceed further. PENDING The HostAgent is waiting for additional information from the user to proceed. The state machine diagram for the HostAgent is shown below:","title":"HostAgent State"},{"location":"agents/host_agent/#task-decomposition","text":"Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:","title":"Task Decomposition"},{"location":"agents/host_agent/#creating-and-registering-appagents","text":"When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent , by calling the create_subagent method: def create_subagent( self, agent_type: str, agent_name: str, process_name: str, app_root_name: str, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str, *args, **kwargs, ) -> BasicAgent: \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self.agent_factory.create_agent( agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs, ) self.appagent_dict[agent_name] = app_agent app_agent.host = self self._active_appagent = app_agent return app_agent The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.","title":"Creating and Registering AppAgents"},{"location":"agents/host_agent/#reference","text":"Bases: BasicAgent The HostAgent class the manager of AppAgents. Initialize the HostAgent. :name: The name of the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Source code in agents/agent/host_agent.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 def __init__ ( self , name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> None : \"\"\" Initialize the HostAgent. :name: The name of the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. \"\"\" super () . __init__ ( name = name ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . agent_factory = AgentFactory () self . appagent_dict = {} self . _active_appagent = None self . _blackboard = Blackboard () self . set_state ( ContinueHostAgentState ()) self . Puppeteer = self . create_puppeteer_interface ()","title":"Reference"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.blackboard","text":"Get the blackboard.","title":"blackboard"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.sub_agent_amount","text":"Get the amount of sub agents. Returns: int \u2013 The amount of sub agents.","title":"sub_agent_amount"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_app_agent","text":"Create the app agent for the host agent. Parameters: application_window_name ( str ) \u2013 The name of the application window. application_root_name ( str ) \u2013 The name of the application root. request ( str ) \u2013 The user request. mode ( str ) \u2013 The mode of the session. Returns: AppAgent \u2013 The app agent. Source code in agents/agent/host_agent.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 def create_app_agent ( self , application_window_name : str , application_root_name : str , request : str , mode : str , ) -> AppAgent : \"\"\" Create the app agent for the host agent. :param application_window_name: The name of the application window. :param application_root_name: The name of the application root. :param request: The user request. :param mode: The mode of the session. :return: The app agent. \"\"\" if mode == \"normal\" : agent_name = \"AppAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) app_agent : AppAgent = self . create_subagent ( agent_type = \"app\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"APPAGENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], ) elif mode == \"follower\" : # Load additional app info prompt. app_info_prompt = configs . get ( \"APP_INFO_PROMPT\" , None ) agent_name = \"FollowerAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) # Create the app agent in the follower mode. app_agent = self . create_subagent ( agent_type = \"follower\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"FOLLOWERAHENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], app_info_prompt = app_info_prompt , ) else : raise ValueError ( f \"The { mode } mode is not supported.\" ) # Create the COM receiver for the app agent. if configs . get ( \"USE_APIS\" , False ): app_agent . Puppeteer . receiver_manager . create_api_receiver ( application_root_name , application_window_name ) # Provision the context for the app agent, including the all retrievers. app_agent . context_provision ( request ) return app_agent","title":"create_app_agent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_puppeteer_interface","text":"Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/host_agent.py 213 214 215 216 217 218 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( \"\" , \"\" )","title":"create_puppeteer_interface"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_subagent","text":"Create an SubAgent hosted by the HostAgent. Parameters: agent_type ( str ) \u2013 The type of the agent to create. agent_name ( str ) \u2013 The name of the SubAgent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: BasicAgent \u2013 The created SubAgent. Source code in agents/agent/host_agent.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def create_subagent ( self , agent_type : str , agent_name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , * args , ** kwargs , ) -> BasicAgent : \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self . agent_factory . create_agent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs , ) self . appagent_dict [ agent_name ] = app_agent app_agent . host = self self . _active_appagent = app_agent return app_agent","title":"create_subagent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.get_active_appagent","text":"Get the active app agent. Returns: AppAgent \u2013 The active app agent. Source code in agents/agent/host_agent.py 150 151 152 153 154 155 def get_active_appagent ( self ) -> AppAgent : \"\"\" Get the active app agent. :return: The active app agent. \"\"\" return self . _active_appagent","title":"get_active_appagent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.get_prompter","text":"Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: HostAgentPrompter \u2013 The prompter instance. Source code in agents/agent/host_agent.py 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> HostAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The prompter instance. \"\"\" return HostAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt )","title":"get_prompter"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.message_constructor","text":"Construct the message. Parameters: image_list ( List [ str ] ) \u2013 The list of screenshot images. os_info ( str ) \u2013 The OS information. prev_subtask ( List [ Dict [ str , str ]] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/host_agent.py 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 def message_constructor ( self , image_list : List [ str ], os_info : str , plan : List [ str ], prev_subtask : List [ Dict [ str , str ]], request : str , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :param image_list: The list of screenshot images. :param os_info: The OS information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :return: The message. \"\"\" hostagent_prompt_system_message = self . prompter . system_prompt_construction () hostagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = os_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () hostagent_prompt_user_message = ( blackboard_prompt + hostagent_prompt_user_message ) hostagent_prompt_message = self . prompter . prompt_construction ( hostagent_prompt_system_message , hostagent_prompt_user_message ) return hostagent_prompt_message","title":"message_constructor"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.print_response","text":"Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/host_agent.py 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" application = response_dict . get ( \"ControlText\" ) if not application : application = \"[The required application needs to be opened.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) bash_command = response_dict . get ( \"Bash\" , None ) subtask = response_dict . get ( \"CurrentSubtask\" ) # Convert the message from a list to a string. message = list ( response_dict . get ( \"Message\" , \"\" )) message = \" \\n \" . join ( message ) # Concatenate the subtask with the plan and convert the plan from a list to a string. plan = list ( response_dict . get ( \"Plan\" )) plan = [ subtask ] + plan plan = \" \\n \" . join ([ f \"( { i + 1 } ) \" + str ( item ) for i , item in enumerate ( plan )]) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) if bash_command : utils . print_with_color ( \"Running Bash Command\ud83d\udd27: {bash} \" . format ( bash = bash_command ), \"yellow\" ) utils . print_with_color ( \"Plans\ud83d\udcda: {plan} \" . format ( plan = plan ), \"cyan\" , ) utils . print_with_color ( \"Next Selected application\ud83d\udcf2: {application} \" . format ( application = application ), \"yellow\" , ) utils . print_with_color ( \"Messages to AppAgent\ud83d\udce9: {message} \" . format ( message = message ), \"cyan\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" )","title":"print_response"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.process","text":"Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/host_agent.py 202 203 204 205 206 207 208 209 210 211 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = HostAgentProcessor ( agent = self , context = context ) self . processor . process () # Sync the status with the processor. self . status = self . processor . status","title":"process"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.process_comfirmation","text":"TODO: Process the confirmation. Source code in agents/agent/host_agent.py 289 290 291 292 293 def process_comfirmation ( self ) -> None : \"\"\" TODO: Process the confirmation. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/overview/","text":"Agents In UFO, there are four types of agents: HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process: Agent Description HostAgent Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. AppAgent Executes actions on the selected application. FollowerAgent Follows the user's instructions to complete the task. EvaluationAgent Evaluates the completeness of a session or a round. In the normal workflow, only the HostAgent and AppAgent are involved in the user interaction process. The FollowerAgent and EvaluationAgent are used for specific tasks. Please see below the orchestration of the agents in UFO: Main Components An agent in UFO is composed of the following main components to fulfill its role in the UFO system: Component Description State Represents the current state of the agent and determines the next action and agent to handle the request. Memory Stores information about the user request, application state, and other relevant data. Blackboard Stores information shared between agents. Prompter Generates prompts for the language model based on the user request and application state. Processor Processes the workflow of the agent, including handling user requests, executing actions, and memory management. Reference Below is the reference for the Agent class in UFO. All agents in UFO inherit from the Agent class and implement necessary methods to fulfill their roles in the UFO system. Bases: ABC The BasicAgent class is the abstract class for the agent. Initialize the BasicAgent. Parameters: name ( str ) \u2013 The name of the agent. Source code in agents/agent/basic.py 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 def __init__ ( self , name : str ) -> None : \"\"\" Initialize the BasicAgent. :param name: The name of the agent. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = self . status_manager . CONTINUE . value self . _register_self () self . retriever_factory = retriever . RetrieverFactory () self . _memory = Memory () self . _host = None self . _processor : Optional [ BaseProcessor ] = None self . _state = None self . Puppeteer : puppeteer . AppPuppeteer = None blackboard : Blackboard property Get the blackboard. Returns: Blackboard \u2013 The blackboard. host : HostAgent property writable Get the host of the agent. Returns: HostAgent \u2013 The host of the agent. memory : Memory property Get the memory of the agent. Returns: Memory \u2013 The memory of the agent. name : str property Get the name of the agent. Returns: str \u2013 The name of the agent. processor : BaseProcessor property writable Get the processor. Returns: BaseProcessor \u2013 The processor. state : AgentState property Get the state of the agent. Returns: AgentState \u2013 The state of the agent. status : str property writable Get the status of the agent. Returns: str \u2013 The status of the agent. status_manager : AgentStatus property Get the status manager. Returns: AgentStatus \u2013 The status manager. step : int property writable Get the step of the agent. Returns: int \u2013 The step of the agent. add_memory ( memory_item ) Update the memory of the agent. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/agent/basic.py 181 182 183 184 185 186 def add_memory ( self , memory_item : MemoryItem ) -> None : \"\"\" Update the memory of the agent. :param memory_item: The memory item to add. \"\"\" self . _memory . add_memory_item ( memory_item ) build_experience_retriever () Build the experience retriever. Source code in agents/agent/basic.py 323 324 325 326 327 def build_experience_retriever ( self ) -> None : \"\"\" Build the experience retriever. \"\"\" pass build_human_demonstration_retriever () Build the human demonstration retriever. Source code in agents/agent/basic.py 329 330 331 332 333 def build_human_demonstration_retriever ( self ) -> None : \"\"\" Build the human demonstration retriever. \"\"\" pass build_offline_docs_retriever () Build the offline docs retriever. Source code in agents/agent/basic.py 311 312 313 314 315 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" pass build_online_search_retriever () Build the online search retriever. Source code in agents/agent/basic.py 317 318 319 320 321 def build_online_search_retriever ( self ) -> None : \"\"\" Build the online search retriever. \"\"\" pass clear_memory () Clear the memory of the agent. Source code in agents/agent/basic.py 195 196 197 198 199 def clear_memory ( self ) -> None : \"\"\" Clear the memory of the agent. \"\"\" self . _memory . clear () create_puppeteer_interface () Create the puppeteer interface. Source code in agents/agent/basic.py 233 234 235 236 237 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the puppeteer interface. \"\"\" pass delete_memory ( step ) Delete the memory of the agent. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/agent/basic.py 188 189 190 191 192 193 def delete_memory ( self , step : int ) -> None : \"\"\" Delete the memory of the agent. :param step: The step of the memory item to delete. \"\"\" self . _memory . delete_memory_item ( step ) get_cls ( name ) classmethod Retrieves an agent class from the registry. Parameters: name ( str ) \u2013 The name of the agent class. Returns: Type ['BasicAgent'] \u2013 The agent class. Source code in agents/agent/basic.py 350 351 352 353 354 355 356 357 @classmethod def get_cls ( cls , name : str ) -> Type [ \"BasicAgent\" ]: \"\"\" Retrieves an agent class from the registry. :param name: The name of the agent class. :return: The agent class. \"\"\" return AgentRegistry () . get_cls ( name ) get_prompter () abstractmethod Get the prompt for the agent. Returns: str \u2013 The prompt. Source code in agents/agent/basic.py 124 125 126 127 128 129 130 @abstractmethod def get_prompter ( self ) -> str : \"\"\" Get the prompt for the agent. :return: The prompt. \"\"\" pass get_response ( message , namescope , use_backup_engine , configs = configs ) classmethod Get the response for the prompt. Parameters: message ( List [ dict ] ) \u2013 The message for LLMs. namescope ( str ) \u2013 The namescope for the LLMs. use_backup_engine ( bool ) \u2013 Whether to use the backup engine. Returns: str \u2013 The response. Source code in agents/agent/basic.py 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 @classmethod def get_response ( cls , message : List [ dict ], namescope : str , use_backup_engine : bool , configs = configs ) -> str : \"\"\" Get the response for the prompt. :param message: The message for LLMs. :param namescope: The namescope for the LLMs. :param use_backup_engine: Whether to use the backup engine. :return: The response. \"\"\" response_string , cost = llm_call . get_completion ( message , namescope , use_backup_engine = use_backup_engine , configs = configs ) return response_string , cost handle ( context ) Handle the agent. Parameters: context ( Context ) \u2013 The context for the agent. Source code in agents/agent/basic.py 220 221 222 223 224 225 def handle ( self , context : Context ) -> None : \"\"\" Handle the agent. :param context: The context for the agent. \"\"\" self . state . handle ( self , context ) message_constructor () abstractmethod Construct the message. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/basic.py 132 133 134 135 136 137 138 @abstractmethod def message_constructor ( self ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :return: The message. \"\"\" pass print_response () Print the response. Source code in agents/agent/basic.py 335 336 337 338 339 def print_response ( self ) -> None : \"\"\" Print the response. \"\"\" pass process ( context ) Process the agent. Source code in agents/agent/basic.py 227 228 229 230 231 def process ( self , context : Context ) -> None : \"\"\" Process the agent. \"\"\" pass process_asker ( ask_user = True ) Ask for the process. Parameters: ask_user ( bool , default: True ) \u2013 Whether to ask the user for the questions. Source code in agents/agent/basic.py 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 def process_asker ( self , ask_user : bool = True ) -> None : \"\"\" Ask for the process. :param ask_user: Whether to ask the user for the questions. \"\"\" if self . processor : question_list = self . processor . question_list if ask_user : utils . print_with_color ( \"Could you please answer the following questions to help me understand your needs and complete the task?\" , \"yellow\" , ) for index , question in enumerate ( question_list ): if ask_user : answer = question_asker ( question , index + 1 ) if not answer . strip (): continue qa_pair = { \"question\" : question , \"answer\" : answer } utils . append_string_to_file ( configs [ \"QA_PAIR_FILE\" ], json . dumps ( qa_pair ) ) else : qa_pair = { \"question\" : question , \"answer\" : \"The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again.\" , } self . blackboard . add_questions ( qa_pair ) process_comfirmation () abstractmethod Confirm the process. Source code in agents/agent/basic.py 280 281 282 283 284 285 @abstractmethod def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. \"\"\" pass process_resume () Resume the process. Source code in agents/agent/basic.py 239 240 241 242 243 244 def process_resume ( self ) -> None : \"\"\" Resume the process. \"\"\" if self . processor : self . processor . resume () reflection () TODO: Reflect on the action. Source code in agents/agent/basic.py 201 202 203 204 205 206 def reflection ( self ) -> None : \"\"\" TODO: Reflect on the action. \"\"\" pass response_to_dict ( response ) staticmethod Convert the response to a dictionary. Parameters: response ( str ) \u2013 The response. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/agent/basic.py 156 157 158 159 160 161 162 163 @staticmethod def response_to_dict ( response : str ) -> Dict [ str , str ]: \"\"\" Convert the response to a dictionary. :param response: The response. :return: The dictionary. \"\"\" return utils . json_parser ( response ) set_state ( state ) Set the state of the agent. Parameters: state ( AgentState ) \u2013 The state of the agent. Source code in agents/agent/basic.py 208 209 210 211 212 213 214 215 216 217 218 def set_state ( self , state : AgentState ) -> None : \"\"\" Set the state of the agent. :param state: The state of the agent. \"\"\" assert issubclass ( type ( self ), state . agent_class () ), f \"The state is only for agent type of { state . agent_class () } , but the current agent is { type ( self ) } .\" self . _state = state","title":"Overview"},{"location":"agents/overview/#agents","text":"In UFO, there are four types of agents: HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process: Agent Description HostAgent Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. AppAgent Executes actions on the selected application. FollowerAgent Follows the user's instructions to complete the task. EvaluationAgent Evaluates the completeness of a session or a round. In the normal workflow, only the HostAgent and AppAgent are involved in the user interaction process. The FollowerAgent and EvaluationAgent are used for specific tasks. Please see below the orchestration of the agents in UFO:","title":"Agents"},{"location":"agents/overview/#main-components","text":"An agent in UFO is composed of the following main components to fulfill its role in the UFO system: Component Description State Represents the current state of the agent and determines the next action and agent to handle the request. Memory Stores information about the user request, application state, and other relevant data. Blackboard Stores information shared between agents. Prompter Generates prompts for the language model based on the user request and application state. Processor Processes the workflow of the agent, including handling user requests, executing actions, and memory management.","title":"Main Components"},{"location":"agents/overview/#reference","text":"Below is the reference for the Agent class in UFO. All agents in UFO inherit from the Agent class and implement necessary methods to fulfill their roles in the UFO system. Bases: ABC The BasicAgent class is the abstract class for the agent. Initialize the BasicAgent. Parameters: name ( str ) \u2013 The name of the agent. Source code in agents/agent/basic.py 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 def __init__ ( self , name : str ) -> None : \"\"\" Initialize the BasicAgent. :param name: The name of the agent. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = self . status_manager . CONTINUE . value self . _register_self () self . retriever_factory = retriever . RetrieverFactory () self . _memory = Memory () self . _host = None self . _processor : Optional [ BaseProcessor ] = None self . _state = None self . Puppeteer : puppeteer . AppPuppeteer = None","title":"Reference"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.blackboard","text":"Get the blackboard. Returns: Blackboard \u2013 The blackboard.","title":"blackboard"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.host","text":"Get the host of the agent. Returns: HostAgent \u2013 The host of the agent.","title":"host"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.memory","text":"Get the memory of the agent. Returns: Memory \u2013 The memory of the agent.","title":"memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.name","text":"Get the name of the agent. Returns: str \u2013 The name of the agent.","title":"name"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.processor","text":"Get the processor. Returns: BaseProcessor \u2013 The processor.","title":"processor"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.state","text":"Get the state of the agent. Returns: AgentState \u2013 The state of the agent.","title":"state"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.status","text":"Get the status of the agent. Returns: str \u2013 The status of the agent.","title":"status"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.status_manager","text":"Get the status manager. Returns: AgentStatus \u2013 The status manager.","title":"status_manager"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.step","text":"Get the step of the agent. Returns: int \u2013 The step of the agent.","title":"step"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.add_memory","text":"Update the memory of the agent. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/agent/basic.py 181 182 183 184 185 186 def add_memory ( self , memory_item : MemoryItem ) -> None : \"\"\" Update the memory of the agent. :param memory_item: The memory item to add. \"\"\" self . _memory . add_memory_item ( memory_item )","title":"add_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_experience_retriever","text":"Build the experience retriever. Source code in agents/agent/basic.py 323 324 325 326 327 def build_experience_retriever ( self ) -> None : \"\"\" Build the experience retriever. \"\"\" pass","title":"build_experience_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_human_demonstration_retriever","text":"Build the human demonstration retriever. Source code in agents/agent/basic.py 329 330 331 332 333 def build_human_demonstration_retriever ( self ) -> None : \"\"\" Build the human demonstration retriever. \"\"\" pass","title":"build_human_demonstration_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_offline_docs_retriever","text":"Build the offline docs retriever. Source code in agents/agent/basic.py 311 312 313 314 315 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" pass","title":"build_offline_docs_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_online_search_retriever","text":"Build the online search retriever. Source code in agents/agent/basic.py 317 318 319 320 321 def build_online_search_retriever ( self ) -> None : \"\"\" Build the online search retriever. \"\"\" pass","title":"build_online_search_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.clear_memory","text":"Clear the memory of the agent. Source code in agents/agent/basic.py 195 196 197 198 199 def clear_memory ( self ) -> None : \"\"\" Clear the memory of the agent. \"\"\" self . _memory . clear ()","title":"clear_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.create_puppeteer_interface","text":"Create the puppeteer interface. Source code in agents/agent/basic.py 233 234 235 236 237 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the puppeteer interface. \"\"\" pass","title":"create_puppeteer_interface"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.delete_memory","text":"Delete the memory of the agent. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/agent/basic.py 188 189 190 191 192 193 def delete_memory ( self , step : int ) -> None : \"\"\" Delete the memory of the agent. :param step: The step of the memory item to delete. \"\"\" self . _memory . delete_memory_item ( step )","title":"delete_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_cls","text":"Retrieves an agent class from the registry. Parameters: name ( str ) \u2013 The name of the agent class. Returns: Type ['BasicAgent'] \u2013 The agent class. Source code in agents/agent/basic.py 350 351 352 353 354 355 356 357 @classmethod def get_cls ( cls , name : str ) -> Type [ \"BasicAgent\" ]: \"\"\" Retrieves an agent class from the registry. :param name: The name of the agent class. :return: The agent class. \"\"\" return AgentRegistry () . get_cls ( name )","title":"get_cls"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_prompter","text":"Get the prompt for the agent. Returns: str \u2013 The prompt. Source code in agents/agent/basic.py 124 125 126 127 128 129 130 @abstractmethod def get_prompter ( self ) -> str : \"\"\" Get the prompt for the agent. :return: The prompt. \"\"\" pass","title":"get_prompter"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_response","text":"Get the response for the prompt. Parameters: message ( List [ dict ] ) \u2013 The message for LLMs. namescope ( str ) \u2013 The namescope for the LLMs. use_backup_engine ( bool ) \u2013 Whether to use the backup engine. Returns: str \u2013 The response. Source code in agents/agent/basic.py 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 @classmethod def get_response ( cls , message : List [ dict ], namescope : str , use_backup_engine : bool , configs = configs ) -> str : \"\"\" Get the response for the prompt. :param message: The message for LLMs. :param namescope: The namescope for the LLMs. :param use_backup_engine: Whether to use the backup engine. :return: The response. \"\"\" response_string , cost = llm_call . get_completion ( message , namescope , use_backup_engine = use_backup_engine , configs = configs ) return response_string , cost","title":"get_response"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.handle","text":"Handle the agent. Parameters: context ( Context ) \u2013 The context for the agent. Source code in agents/agent/basic.py 220 221 222 223 224 225 def handle ( self , context : Context ) -> None : \"\"\" Handle the agent. :param context: The context for the agent. \"\"\" self . state . handle ( self , context )","title":"handle"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.message_constructor","text":"Construct the message. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/basic.py 132 133 134 135 136 137 138 @abstractmethod def message_constructor ( self ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :return: The message. \"\"\" pass","title":"message_constructor"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.print_response","text":"Print the response. Source code in agents/agent/basic.py 335 336 337 338 339 def print_response ( self ) -> None : \"\"\" Print the response. \"\"\" pass","title":"print_response"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process","text":"Process the agent. Source code in agents/agent/basic.py 227 228 229 230 231 def process ( self , context : Context ) -> None : \"\"\" Process the agent. \"\"\" pass","title":"process"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_asker","text":"Ask for the process. Parameters: ask_user ( bool , default: True ) \u2013 Whether to ask the user for the questions. Source code in agents/agent/basic.py 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 def process_asker ( self , ask_user : bool = True ) -> None : \"\"\" Ask for the process. :param ask_user: Whether to ask the user for the questions. \"\"\" if self . processor : question_list = self . processor . question_list if ask_user : utils . print_with_color ( \"Could you please answer the following questions to help me understand your needs and complete the task?\" , \"yellow\" , ) for index , question in enumerate ( question_list ): if ask_user : answer = question_asker ( question , index + 1 ) if not answer . strip (): continue qa_pair = { \"question\" : question , \"answer\" : answer } utils . append_string_to_file ( configs [ \"QA_PAIR_FILE\" ], json . dumps ( qa_pair ) ) else : qa_pair = { \"question\" : question , \"answer\" : \"The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again.\" , } self . blackboard . add_questions ( qa_pair )","title":"process_asker"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_comfirmation","text":"Confirm the process. Source code in agents/agent/basic.py 280 281 282 283 284 285 @abstractmethod def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_resume","text":"Resume the process. Source code in agents/agent/basic.py 239 240 241 242 243 244 def process_resume ( self ) -> None : \"\"\" Resume the process. \"\"\" if self . processor : self . processor . resume ()","title":"process_resume"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.reflection","text":"TODO: Reflect on the action. Source code in agents/agent/basic.py 201 202 203 204 205 206 def reflection ( self ) -> None : \"\"\" TODO: Reflect on the action. \"\"\" pass","title":"reflection"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.response_to_dict","text":"Convert the response to a dictionary. Parameters: response ( str ) \u2013 The response. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/agent/basic.py 156 157 158 159 160 161 162 163 @staticmethod def response_to_dict ( response : str ) -> Dict [ str , str ]: \"\"\" Convert the response to a dictionary. :param response: The response. :return: The dictionary. \"\"\" return utils . json_parser ( response )","title":"response_to_dict"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.set_state","text":"Set the state of the agent. Parameters: state ( AgentState ) \u2013 The state of the agent. Source code in agents/agent/basic.py 208 209 210 211 212 213 214 215 216 217 218 def set_state ( self , state : AgentState ) -> None : \"\"\" Set the state of the agent. :param state: The state of the agent. \"\"\" assert issubclass ( type ( self ), state . agent_class () ), f \"The state is only for agent type of { state . agent_class () } , but the current agent is { type ( self ) } .\" self . _state = state","title":"set_state"},{"location":"agents/design/blackboard/","text":"Agent Blackboard The Blackboard is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard is implemented as a class in the ufo/agents/memory/blackboard.py file. Components The Blackboard consists of the following data components: Component Description questions A list of questions that UFO asks the user, along with their corresponding answers. requests A list of historical user requests received in previous Round . trajectories A list of step-wise trajectories that record the agent's actions and decisions at each step. screenshots A list of screenshots taken by the agent when it believes the current state is important for future reference. Tip The keys stored in the trajectories are configured as HISTORY_KEYS in the config_dev.yaml file. You can customize the keys based on your requirements and the agent's logic. Tip Whether to save the screenshots is determined by the AppAgent . You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY flag in the config_dev.yaml file. Blackboard to Prompt Data in the Blackboard is based on the MemoryItem class. It has a method blackboard_to_prompt that converts the information stored in the Blackboard to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt method is defined as follows: def blackboard_to_prompt(self) -> List[str]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\": \"text\", \"text\": \"[Blackboard:]\", } ] blackboard_prompt = ( prefix + self.texts_to_prompt(self.questions, \"[Questions & Answers:]\") + self.texts_to_prompt(self.requests, \"[Request History:]\") + self.texts_to_prompt(self.trajectories, \"[Step Trajectories Completed Previously:]\") + self.screenshots_to_prompt() ) return blackboard_prompt Reference Class for the blackboard, which stores the data and images which are visible to all the agents. Initialize the blackboard. Source code in agents/memory/blackboard.py 41 42 43 44 45 46 47 48 49 50 51 52 53 def __init__ ( self ) -> None : \"\"\" Initialize the blackboard. \"\"\" self . _questions : Memory = Memory () self . _requests : Memory = Memory () self . _trajectories : Memory = Memory () self . _screenshots : Memory = Memory () if configs . get ( \"USE_CUSTOMIZATION\" , False ): self . load_questions ( configs . get ( \"QA_PAIR_FILE\" , \"\" ), configs . get ( \"QA_PAIR_NUM\" , - 1 ) ) questions : Memory property Get the data from the blackboard. Returns: Memory \u2013 The questions from the blackboard. requests : Memory property Get the data from the blackboard. Returns: Memory \u2013 The requests from the blackboard. screenshots : Memory property Get the images from the blackboard. Returns: Memory \u2013 The images from the blackboard. trajectories : Memory property Get the data from the blackboard. Returns: Memory \u2013 The trajectories from the blackboard. add_data ( data , memory ) Add the data to the a memory in the blackboard. Parameters: data ( Union [ MemoryItem , Dict [ str , str ], str ] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. memory ( Memory ) \u2013 The memory to add the data to. Source code in agents/memory/blackboard.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 def add_data ( self , data : Union [ MemoryItem , Dict [ str , str ], str ], memory : Memory ) -> None : \"\"\" Add the data to the a memory in the blackboard. :param data: The data to be added. It can be a dictionary or a MemoryItem or a string. :param memory: The memory to add the data to. \"\"\" if isinstance ( data , dict ): data_memory = MemoryItem () data_memory . add_values_from_dict ( data ) memory . add_memory_item ( data_memory ) elif isinstance ( data , MemoryItem ): memory . add_memory_item ( data ) elif isinstance ( data , str ): data_memory = MemoryItem () data_memory . add_values_from_dict ({ \"text\" : data }) memory . add_memory_item ( data_memory ) add_image ( screenshot_path = '' , metadata = None ) Add the image to the blackboard. Parameters: screenshot_path ( str , default: '' ) \u2013 The path of the image. metadata ( Optional [ Dict [ str , str ]] , default: None ) \u2013 The metadata of the image. Source code in agents/memory/blackboard.py 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 def add_image ( self , screenshot_path : str = \"\" , metadata : Optional [ Dict [ str , str ]] = None , ) -> None : \"\"\" Add the image to the blackboard. :param screenshot_path: The path of the image. :param metadata: The metadata of the image. \"\"\" if os . path . exists ( screenshot_path ): screenshot_str = PhotographerFacade () . encode_image_from_path ( screenshot_path ) else : print ( f \"Screenshot path { screenshot_path } does not exist.\" ) screenshot_str = \"\" image_memory_item = ImageMemoryItem () image_memory_item . add_values_from_dict ( { ImageMemoryItemNames . METADATA : metadata . get ( ImageMemoryItemNames . METADATA ), ImageMemoryItemNames . IMAGE_PATH : screenshot_path , ImageMemoryItemNames . IMAGE_STR : screenshot_str , } ) self . screenshots . add_memory_item ( image_memory_item ) add_questions ( questions ) Add the data to the blackboard. Parameters: questions ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 107 108 109 110 111 112 113 def add_questions ( self , questions : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param questions: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( questions , self . questions ) add_requests ( requests ) Add the data to the blackboard. Parameters: requests ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 115 116 117 118 119 120 121 def add_requests ( self , requests : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param requests: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( requests , self . requests ) add_trajectories ( trajectories ) Add the data to the blackboard. Parameters: trajectories ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 123 124 125 126 127 128 129 def add_trajectories ( self , trajectories : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param trajectories: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( trajectories , self . trajectories ) blackboard_to_prompt () Convert the blackboard to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def blackboard_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\" : \"text\" , \"text\" : \"[Blackboard:]\" , } ] blackboard_prompt = ( prefix + self . texts_to_prompt ( self . questions , \"[Questions & Answers:]\" ) + self . texts_to_prompt ( self . requests , \"[Request History:]\" ) + self . texts_to_prompt ( self . trajectories , \"[Step Trajectories Completed Previously:]\" ) + self . screenshots_to_prompt () ) return blackboard_prompt clear () Clear the blackboard. Source code in agents/memory/blackboard.py 277 278 279 280 281 282 283 284 def clear ( self ) -> None : \"\"\" Clear the blackboard. \"\"\" self . questions . clear () self . requests . clear () self . trajectories . clear () self . screenshots . clear () is_empty () Check if the blackboard is empty. Returns: bool \u2013 True if the blackboard is empty, False otherwise. Source code in agents/memory/blackboard.py 265 266 267 268 269 270 271 272 273 274 275 def is_empty ( self ) -> bool : \"\"\" Check if the blackboard is empty. :return: True if the blackboard is empty, False otherwise. \"\"\" return ( self . questions . is_empty () and self . requests . is_empty () and self . trajectories . is_empty () and self . screenshots . is_empty () ) load_questions ( file_path , last_k =- 1 ) Load the data from a file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Source code in agents/memory/blackboard.py 192 193 194 195 196 197 198 199 200 def load_questions ( self , file_path : str , last_k =- 1 ) -> None : \"\"\" Load the data from a file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. \"\"\" qa_list = self . read_json_file ( file_path , last_k ) for qa in qa_list : self . add_questions ( qa ) questions_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 164 165 166 167 168 169 def questions_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . questions . to_json () read_json_file ( file_path , last_k =- 1 ) staticmethod Read the json file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Returns: Dict [ str , str ] \u2013 The data in the file. Source code in agents/memory/blackboard.py 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 @staticmethod def read_json_file ( file_path : str , last_k =- 1 ) -> Dict [ str , str ]: \"\"\" Read the json file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. :return: The data in the file. \"\"\" data_list = [] # Check if the file exists if os . path . exists ( file_path ): # Open the file and read the lines with open ( file_path , \"r\" , encoding = \"utf-8\" ) as file : lines = file . readlines () # If last_k is not -1, only read the last k lines if last_k != - 1 : lines = lines [ - last_k :] # Parse the lines as JSON for line in lines : try : data = json . loads ( line . strip ()) data_list . append ( data ) except json . JSONDecodeError : print ( f \"Warning: Unable to parse line as JSON: { line } \" ) return data_list requests_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 171 172 173 174 175 176 def requests_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . requests . to_json () screenshots_to_json () Convert the images to a dictionary. Returns: str \u2013 The images in the dictionary format. Source code in agents/memory/blackboard.py 185 186 187 188 189 190 def screenshots_to_json ( self ) -> str : \"\"\" Convert the images to a dictionary. :return: The images in the dictionary format. \"\"\" return self . screenshots . to_json () screenshots_to_prompt () Convert the images to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 def screenshots_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the images to a prompt. :return: The prompt. \"\"\" user_content = [] for screenshot_dict in self . screenshots . list_content : user_content . append ( { \"type\" : \"text\" , \"text\" : json . dumps ( screenshot_dict . get ( ImageMemoryItemNames . METADATA , \"\" ) ), } ) user_content . append ( { \"type\" : \"image_url\" , \"image_url\" : { \"url\" : screenshot_dict . get ( ImageMemoryItemNames . IMAGE_STR , \"\" ) }, } ) return user_content texts_to_prompt ( memory , prefix ) Convert the data to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 202 203 204 205 206 207 208 209 210 211 212 def texts_to_prompt ( self , memory : Memory , prefix : str ) -> List [ str ]: \"\"\" Convert the data to a prompt. :return: The prompt. \"\"\" user_content = [ { \"type\" : \"text\" , \"text\" : f \" { prefix } \\n { json . dumps ( memory . list_content ) } \" } ] return user_content trajectories_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 178 179 180 181 182 183 def trajectories_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . trajectories . to_json () Note You can customize the class to tailor the Blackboard to your requirements.","title":"Blackboard"},{"location":"agents/design/blackboard/#agent-blackboard","text":"The Blackboard is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard is implemented as a class in the ufo/agents/memory/blackboard.py file.","title":"Agent Blackboard"},{"location":"agents/design/blackboard/#components","text":"The Blackboard consists of the following data components: Component Description questions A list of questions that UFO asks the user, along with their corresponding answers. requests A list of historical user requests received in previous Round . trajectories A list of step-wise trajectories that record the agent's actions and decisions at each step. screenshots A list of screenshots taken by the agent when it believes the current state is important for future reference. Tip The keys stored in the trajectories are configured as HISTORY_KEYS in the config_dev.yaml file. You can customize the keys based on your requirements and the agent's logic. Tip Whether to save the screenshots is determined by the AppAgent . You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY flag in the config_dev.yaml file.","title":"Components"},{"location":"agents/design/blackboard/#blackboard-to-prompt","text":"Data in the Blackboard is based on the MemoryItem class. It has a method blackboard_to_prompt that converts the information stored in the Blackboard to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt method is defined as follows: def blackboard_to_prompt(self) -> List[str]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\": \"text\", \"text\": \"[Blackboard:]\", } ] blackboard_prompt = ( prefix + self.texts_to_prompt(self.questions, \"[Questions & Answers:]\") + self.texts_to_prompt(self.requests, \"[Request History:]\") + self.texts_to_prompt(self.trajectories, \"[Step Trajectories Completed Previously:]\") + self.screenshots_to_prompt() ) return blackboard_prompt","title":"Blackboard to Prompt"},{"location":"agents/design/blackboard/#reference","text":"Class for the blackboard, which stores the data and images which are visible to all the agents. Initialize the blackboard. Source code in agents/memory/blackboard.py 41 42 43 44 45 46 47 48 49 50 51 52 53 def __init__ ( self ) -> None : \"\"\" Initialize the blackboard. \"\"\" self . _questions : Memory = Memory () self . _requests : Memory = Memory () self . _trajectories : Memory = Memory () self . _screenshots : Memory = Memory () if configs . get ( \"USE_CUSTOMIZATION\" , False ): self . load_questions ( configs . get ( \"QA_PAIR_FILE\" , \"\" ), configs . get ( \"QA_PAIR_NUM\" , - 1 ) )","title":"Reference"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.questions","text":"Get the data from the blackboard. Returns: Memory \u2013 The questions from the blackboard.","title":"questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.requests","text":"Get the data from the blackboard. Returns: Memory \u2013 The requests from the blackboard.","title":"requests"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots","text":"Get the images from the blackboard. Returns: Memory \u2013 The images from the blackboard.","title":"screenshots"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.trajectories","text":"Get the data from the blackboard. Returns: Memory \u2013 The trajectories from the blackboard.","title":"trajectories"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_data","text":"Add the data to the a memory in the blackboard. Parameters: data ( Union [ MemoryItem , Dict [ str , str ], str ] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. memory ( Memory ) \u2013 The memory to add the data to. Source code in agents/memory/blackboard.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 def add_data ( self , data : Union [ MemoryItem , Dict [ str , str ], str ], memory : Memory ) -> None : \"\"\" Add the data to the a memory in the blackboard. :param data: The data to be added. It can be a dictionary or a MemoryItem or a string. :param memory: The memory to add the data to. \"\"\" if isinstance ( data , dict ): data_memory = MemoryItem () data_memory . add_values_from_dict ( data ) memory . add_memory_item ( data_memory ) elif isinstance ( data , MemoryItem ): memory . add_memory_item ( data ) elif isinstance ( data , str ): data_memory = MemoryItem () data_memory . add_values_from_dict ({ \"text\" : data }) memory . add_memory_item ( data_memory )","title":"add_data"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_image","text":"Add the image to the blackboard. Parameters: screenshot_path ( str , default: '' ) \u2013 The path of the image. metadata ( Optional [ Dict [ str , str ]] , default: None ) \u2013 The metadata of the image. Source code in agents/memory/blackboard.py 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 def add_image ( self , screenshot_path : str = \"\" , metadata : Optional [ Dict [ str , str ]] = None , ) -> None : \"\"\" Add the image to the blackboard. :param screenshot_path: The path of the image. :param metadata: The metadata of the image. \"\"\" if os . path . exists ( screenshot_path ): screenshot_str = PhotographerFacade () . encode_image_from_path ( screenshot_path ) else : print ( f \"Screenshot path { screenshot_path } does not exist.\" ) screenshot_str = \"\" image_memory_item = ImageMemoryItem () image_memory_item . add_values_from_dict ( { ImageMemoryItemNames . METADATA : metadata . get ( ImageMemoryItemNames . METADATA ), ImageMemoryItemNames . IMAGE_PATH : screenshot_path , ImageMemoryItemNames . IMAGE_STR : screenshot_str , } ) self . screenshots . add_memory_item ( image_memory_item )","title":"add_image"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_questions","text":"Add the data to the blackboard. Parameters: questions ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 107 108 109 110 111 112 113 def add_questions ( self , questions : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param questions: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( questions , self . questions )","title":"add_questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_requests","text":"Add the data to the blackboard. Parameters: requests ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 115 116 117 118 119 120 121 def add_requests ( self , requests : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param requests: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( requests , self . requests )","title":"add_requests"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_trajectories","text":"Add the data to the blackboard. Parameters: trajectories ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 123 124 125 126 127 128 129 def add_trajectories ( self , trajectories : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param trajectories: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( trajectories , self . trajectories )","title":"add_trajectories"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.blackboard_to_prompt","text":"Convert the blackboard to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def blackboard_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\" : \"text\" , \"text\" : \"[Blackboard:]\" , } ] blackboard_prompt = ( prefix + self . texts_to_prompt ( self . questions , \"[Questions & Answers:]\" ) + self . texts_to_prompt ( self . requests , \"[Request History:]\" ) + self . texts_to_prompt ( self . trajectories , \"[Step Trajectories Completed Previously:]\" ) + self . screenshots_to_prompt () ) return blackboard_prompt","title":"blackboard_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.clear","text":"Clear the blackboard. Source code in agents/memory/blackboard.py 277 278 279 280 281 282 283 284 def clear ( self ) -> None : \"\"\" Clear the blackboard. \"\"\" self . questions . clear () self . requests . clear () self . trajectories . clear () self . screenshots . clear ()","title":"clear"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.is_empty","text":"Check if the blackboard is empty. Returns: bool \u2013 True if the blackboard is empty, False otherwise. Source code in agents/memory/blackboard.py 265 266 267 268 269 270 271 272 273 274 275 def is_empty ( self ) -> bool : \"\"\" Check if the blackboard is empty. :return: True if the blackboard is empty, False otherwise. \"\"\" return ( self . questions . is_empty () and self . requests . is_empty () and self . trajectories . is_empty () and self . screenshots . is_empty () )","title":"is_empty"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.load_questions","text":"Load the data from a file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Source code in agents/memory/blackboard.py 192 193 194 195 196 197 198 199 200 def load_questions ( self , file_path : str , last_k =- 1 ) -> None : \"\"\" Load the data from a file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. \"\"\" qa_list = self . read_json_file ( file_path , last_k ) for qa in qa_list : self . add_questions ( qa )","title":"load_questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.questions_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 164 165 166 167 168 169 def questions_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . questions . to_json ()","title":"questions_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.read_json_file","text":"Read the json file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Returns: Dict [ str , str ] \u2013 The data in the file. Source code in agents/memory/blackboard.py 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 @staticmethod def read_json_file ( file_path : str , last_k =- 1 ) -> Dict [ str , str ]: \"\"\" Read the json file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. :return: The data in the file. \"\"\" data_list = [] # Check if the file exists if os . path . exists ( file_path ): # Open the file and read the lines with open ( file_path , \"r\" , encoding = \"utf-8\" ) as file : lines = file . readlines () # If last_k is not -1, only read the last k lines if last_k != - 1 : lines = lines [ - last_k :] # Parse the lines as JSON for line in lines : try : data = json . loads ( line . strip ()) data_list . append ( data ) except json . JSONDecodeError : print ( f \"Warning: Unable to parse line as JSON: { line } \" ) return data_list","title":"read_json_file"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.requests_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 171 172 173 174 175 176 def requests_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . requests . to_json ()","title":"requests_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots_to_json","text":"Convert the images to a dictionary. Returns: str \u2013 The images in the dictionary format. Source code in agents/memory/blackboard.py 185 186 187 188 189 190 def screenshots_to_json ( self ) -> str : \"\"\" Convert the images to a dictionary. :return: The images in the dictionary format. \"\"\" return self . screenshots . to_json ()","title":"screenshots_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots_to_prompt","text":"Convert the images to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 def screenshots_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the images to a prompt. :return: The prompt. \"\"\" user_content = [] for screenshot_dict in self . screenshots . list_content : user_content . append ( { \"type\" : \"text\" , \"text\" : json . dumps ( screenshot_dict . get ( ImageMemoryItemNames . METADATA , \"\" ) ), } ) user_content . append ( { \"type\" : \"image_url\" , \"image_url\" : { \"url\" : screenshot_dict . get ( ImageMemoryItemNames . IMAGE_STR , \"\" ) }, } ) return user_content","title":"screenshots_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.texts_to_prompt","text":"Convert the data to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 202 203 204 205 206 207 208 209 210 211 212 def texts_to_prompt ( self , memory : Memory , prefix : str ) -> List [ str ]: \"\"\" Convert the data to a prompt. :return: The prompt. \"\"\" user_content = [ { \"type\" : \"text\" , \"text\" : f \" { prefix } \\n { json . dumps ( memory . list_content ) } \" } ] return user_content","title":"texts_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.trajectories_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 178 179 180 181 182 183 def trajectories_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . trajectories . to_json () Note You can customize the class to tailor the Blackboard to your requirements.","title":"trajectories_to_json"},{"location":"agents/design/memory/","text":"Agent Memory The Memory manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory will be visible to the agent for decision-making. MemoryItem A MemoryItem is a dataclass that represents a single step in the agent's memory. The fields of a MemoryItem is flexible and can be customized based on the requirements of the agent. The MemoryItem class is defined as follows: This data class represents a memory item of an agent at one step. attributes : List [ str ] property Get the attributes of the memory item. Returns: List [ str ] \u2013 The attributes. add_values_from_dict ( values ) Add fields to the memory item. Parameters: values ( Dict [ str , Any ] ) \u2013 The values of the fields. Source code in agents/memory/memory.py 57 58 59 60 61 62 63 def add_values_from_dict ( self , values : Dict [ str , Any ]) -> None : \"\"\" Add fields to the memory item. :param values: The values of the fields. \"\"\" for key , value in values . items (): self . set_value ( key , value ) filter ( keys = []) Fetch the memory item. Parameters: keys ( List [ str ] , default: [] ) \u2013 The keys to fetch. Returns: None \u2013 The filtered memory item. Source code in agents/memory/memory.py 37 38 39 40 41 42 43 44 def filter ( self , keys : List [ str ] = []) -> None : \"\"\" Fetch the memory item. :param keys: The keys to fetch. :return: The filtered memory item. \"\"\" return { key : value for key , value in self . to_dict () . items () if key in keys } get_value ( key ) Get the value of the field. Parameters: key ( str ) \u2013 The key of the field. Returns: Optional [ str ] \u2013 The value of the field. Source code in agents/memory/memory.py 65 66 67 68 69 70 71 72 def get_value ( self , key : str ) -> Optional [ str ]: \"\"\" Get the value of the field. :param key: The key of the field. :return: The value of the field. \"\"\" return getattr ( self , key , None ) get_values ( keys ) Get the values of the fields. Parameters: keys ( List [ str ] ) \u2013 The keys of the fields. Returns: dict \u2013 The values of the fields. Source code in agents/memory/memory.py 74 75 76 77 78 79 80 def get_values ( self , keys : List [ str ]) -> dict : \"\"\" Get the values of the fields. :param keys: The keys of the fields. :return: The values of the fields. \"\"\" return { key : self . get_value ( key ) for key in keys } set_value ( key , value ) Add a field to the memory item. Parameters: key ( str ) \u2013 The key of the field. value ( str ) \u2013 The value of the field. Source code in agents/memory/memory.py 46 47 48 49 50 51 52 53 54 55 def set_value ( self , key : str , value : str ) -> None : \"\"\" Add a field to the memory item. :param key: The key of the field. :param value: The value of the field. \"\"\" setattr ( self , key , value ) if key not in self . _memory_attributes : self . _memory_attributes . append ( key ) to_dict () Convert the MemoryItem to a dictionary. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/memory/memory.py 19 20 21 22 23 24 25 26 27 28 def to_dict ( self ) -> Dict [ str , str ]: \"\"\" Convert the MemoryItem to a dictionary. :return: The dictionary. \"\"\" return { key : value for key , value in self . __dict__ . items () if key in self . _memory_attributes } to_json () Convert the memory item to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 30 31 32 33 34 35 def to_json ( self ) -> str : \"\"\" Convert the memory item to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( self . to_dict ()) Info At each step, an instance of MemoryItem is created and stored in the Memory to record the information of the agent's interaction with the user and applications. Memory The Memory class is responsible for managing the memory of the agent. It stores a list of MemoryItem instances that represent the agent's memory at each step. The Memory class is defined as follows: This data class represents a memory of an agent. content : List [ MemoryItem ] property Get the content of the memory. Returns: List [ MemoryItem ] \u2013 The content of the memory. length : int property Get the length of the memory. Returns: int \u2013 The length of the memory. list_content : List [ Dict [ str , str ]] property List the content of the memory. Returns: List [ Dict [ str , str ]] \u2013 The content of the memory. add_memory_item ( memory_item ) Add a memory item to the memory. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/memory/memory.py 122 123 124 125 126 127 def add_memory_item ( self , memory_item : MemoryItem ) -> None : \"\"\" Add a memory item to the memory. :param memory_item: The memory item to add. \"\"\" self . _content . append ( memory_item ) clear () Clear the memory. Source code in agents/memory/memory.py 129 130 131 132 133 def clear ( self ) -> None : \"\"\" Clear the memory. \"\"\" self . _content = [] delete_memory_item ( step ) Delete a memory item from the memory. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/memory/memory.py 143 144 145 146 147 148 def delete_memory_item ( self , step : int ) -> None : \"\"\" Delete a memory item from the memory. :param step: The step of the memory item to delete. \"\"\" self . _content = [ item for item in self . _content if item . step != step ] filter_memory_from_keys ( keys ) Filter the memory from the keys. If an item does not have the key, the key will be ignored. Parameters: keys ( List [ str ] ) \u2013 The keys to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 114 115 116 117 118 119 120 def filter_memory_from_keys ( self , keys : List [ str ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the keys. If an item does not have the key, the key will be ignored. :param keys: The keys to filter. :return: The filtered memory. \"\"\" return [ item . filter ( keys ) for item in self . _content ] filter_memory_from_steps ( steps ) Filter the memory from the steps. Parameters: steps ( List [ int ] ) \u2013 The steps to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 106 107 108 109 110 111 112 def filter_memory_from_steps ( self , steps : List [ int ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the steps. :param steps: The steps to filter. :return: The filtered memory. \"\"\" return [ item . to_dict () for item in self . _content if item . step in steps ] get_latest_item () Get the latest memory item. Returns: MemoryItem \u2013 The latest memory item. Source code in agents/memory/memory.py 160 161 162 163 164 165 166 167 def get_latest_item ( self ) -> MemoryItem : \"\"\" Get the latest memory item. :return: The latest memory item. \"\"\" if self . length == 0 : return None return self . _content [ - 1 ] is_empty () Check if the memory is empty. Returns: bool \u2013 The boolean value indicating if the memory is empty. Source code in agents/memory/memory.py 185 186 187 188 189 190 def is_empty ( self ) -> bool : \"\"\" Check if the memory is empty. :return: The boolean value indicating if the memory is empty. \"\"\" return self . length == 0 load ( content ) Load the data from the memory. Parameters: content ( List [ MemoryItem ] ) \u2013 The content to load. Source code in agents/memory/memory.py 99 100 101 102 103 104 def load ( self , content : List [ MemoryItem ]) -> None : \"\"\" Load the data from the memory. :param content: The content to load. \"\"\" self . _content = content to_json () Convert the memory to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 150 151 152 153 154 155 156 157 158 def to_json ( self ) -> str : \"\"\" Convert the memory to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( [ item . to_dict () for item in self . _content if item is not None ] ) Info Each agent has its own Memory instance to store their information. Info Not all information in the Memory are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.","title":"Memory"},{"location":"agents/design/memory/#agent-memory","text":"The Memory manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory will be visible to the agent for decision-making.","title":"Agent Memory"},{"location":"agents/design/memory/#memoryitem","text":"A MemoryItem is a dataclass that represents a single step in the agent's memory. The fields of a MemoryItem is flexible and can be customized based on the requirements of the agent. The MemoryItem class is defined as follows: This data class represents a memory item of an agent at one step.","title":"MemoryItem"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.attributes","text":"Get the attributes of the memory item. Returns: List [ str ] \u2013 The attributes.","title":"attributes"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.add_values_from_dict","text":"Add fields to the memory item. Parameters: values ( Dict [ str , Any ] ) \u2013 The values of the fields. Source code in agents/memory/memory.py 57 58 59 60 61 62 63 def add_values_from_dict ( self , values : Dict [ str , Any ]) -> None : \"\"\" Add fields to the memory item. :param values: The values of the fields. \"\"\" for key , value in values . items (): self . set_value ( key , value )","title":"add_values_from_dict"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.filter","text":"Fetch the memory item. Parameters: keys ( List [ str ] , default: [] ) \u2013 The keys to fetch. Returns: None \u2013 The filtered memory item. Source code in agents/memory/memory.py 37 38 39 40 41 42 43 44 def filter ( self , keys : List [ str ] = []) -> None : \"\"\" Fetch the memory item. :param keys: The keys to fetch. :return: The filtered memory item. \"\"\" return { key : value for key , value in self . to_dict () . items () if key in keys }","title":"filter"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.get_value","text":"Get the value of the field. Parameters: key ( str ) \u2013 The key of the field. Returns: Optional [ str ] \u2013 The value of the field. Source code in agents/memory/memory.py 65 66 67 68 69 70 71 72 def get_value ( self , key : str ) -> Optional [ str ]: \"\"\" Get the value of the field. :param key: The key of the field. :return: The value of the field. \"\"\" return getattr ( self , key , None )","title":"get_value"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.get_values","text":"Get the values of the fields. Parameters: keys ( List [ str ] ) \u2013 The keys of the fields. Returns: dict \u2013 The values of the fields. Source code in agents/memory/memory.py 74 75 76 77 78 79 80 def get_values ( self , keys : List [ str ]) -> dict : \"\"\" Get the values of the fields. :param keys: The keys of the fields. :return: The values of the fields. \"\"\" return { key : self . get_value ( key ) for key in keys }","title":"get_values"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.set_value","text":"Add a field to the memory item. Parameters: key ( str ) \u2013 The key of the field. value ( str ) \u2013 The value of the field. Source code in agents/memory/memory.py 46 47 48 49 50 51 52 53 54 55 def set_value ( self , key : str , value : str ) -> None : \"\"\" Add a field to the memory item. :param key: The key of the field. :param value: The value of the field. \"\"\" setattr ( self , key , value ) if key not in self . _memory_attributes : self . _memory_attributes . append ( key )","title":"set_value"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.to_dict","text":"Convert the MemoryItem to a dictionary. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/memory/memory.py 19 20 21 22 23 24 25 26 27 28 def to_dict ( self ) -> Dict [ str , str ]: \"\"\" Convert the MemoryItem to a dictionary. :return: The dictionary. \"\"\" return { key : value for key , value in self . __dict__ . items () if key in self . _memory_attributes }","title":"to_dict"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.to_json","text":"Convert the memory item to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 30 31 32 33 34 35 def to_json ( self ) -> str : \"\"\" Convert the memory item to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( self . to_dict ()) Info At each step, an instance of MemoryItem is created and stored in the Memory to record the information of the agent's interaction with the user and applications.","title":"to_json"},{"location":"agents/design/memory/#memory","text":"The Memory class is responsible for managing the memory of the agent. It stores a list of MemoryItem instances that represent the agent's memory at each step. The Memory class is defined as follows: This data class represents a memory of an agent.","title":"Memory"},{"location":"agents/design/memory/#agents.memory.memory.Memory.content","text":"Get the content of the memory. Returns: List [ MemoryItem ] \u2013 The content of the memory.","title":"content"},{"location":"agents/design/memory/#agents.memory.memory.Memory.length","text":"Get the length of the memory. Returns: int \u2013 The length of the memory.","title":"length"},{"location":"agents/design/memory/#agents.memory.memory.Memory.list_content","text":"List the content of the memory. Returns: List [ Dict [ str , str ]] \u2013 The content of the memory.","title":"list_content"},{"location":"agents/design/memory/#agents.memory.memory.Memory.add_memory_item","text":"Add a memory item to the memory. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/memory/memory.py 122 123 124 125 126 127 def add_memory_item ( self , memory_item : MemoryItem ) -> None : \"\"\" Add a memory item to the memory. :param memory_item: The memory item to add. \"\"\" self . _content . append ( memory_item )","title":"add_memory_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.clear","text":"Clear the memory. Source code in agents/memory/memory.py 129 130 131 132 133 def clear ( self ) -> None : \"\"\" Clear the memory. \"\"\" self . _content = []","title":"clear"},{"location":"agents/design/memory/#agents.memory.memory.Memory.delete_memory_item","text":"Delete a memory item from the memory. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/memory/memory.py 143 144 145 146 147 148 def delete_memory_item ( self , step : int ) -> None : \"\"\" Delete a memory item from the memory. :param step: The step of the memory item to delete. \"\"\" self . _content = [ item for item in self . _content if item . step != step ]","title":"delete_memory_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.filter_memory_from_keys","text":"Filter the memory from the keys. If an item does not have the key, the key will be ignored. Parameters: keys ( List [ str ] ) \u2013 The keys to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 114 115 116 117 118 119 120 def filter_memory_from_keys ( self , keys : List [ str ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the keys. If an item does not have the key, the key will be ignored. :param keys: The keys to filter. :return: The filtered memory. \"\"\" return [ item . filter ( keys ) for item in self . _content ]","title":"filter_memory_from_keys"},{"location":"agents/design/memory/#agents.memory.memory.Memory.filter_memory_from_steps","text":"Filter the memory from the steps. Parameters: steps ( List [ int ] ) \u2013 The steps to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 106 107 108 109 110 111 112 def filter_memory_from_steps ( self , steps : List [ int ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the steps. :param steps: The steps to filter. :return: The filtered memory. \"\"\" return [ item . to_dict () for item in self . _content if item . step in steps ]","title":"filter_memory_from_steps"},{"location":"agents/design/memory/#agents.memory.memory.Memory.get_latest_item","text":"Get the latest memory item. Returns: MemoryItem \u2013 The latest memory item. Source code in agents/memory/memory.py 160 161 162 163 164 165 166 167 def get_latest_item ( self ) -> MemoryItem : \"\"\" Get the latest memory item. :return: The latest memory item. \"\"\" if self . length == 0 : return None return self . _content [ - 1 ]","title":"get_latest_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.is_empty","text":"Check if the memory is empty. Returns: bool \u2013 The boolean value indicating if the memory is empty. Source code in agents/memory/memory.py 185 186 187 188 189 190 def is_empty ( self ) -> bool : \"\"\" Check if the memory is empty. :return: The boolean value indicating if the memory is empty. \"\"\" return self . length == 0","title":"is_empty"},{"location":"agents/design/memory/#agents.memory.memory.Memory.load","text":"Load the data from the memory. Parameters: content ( List [ MemoryItem ] ) \u2013 The content to load. Source code in agents/memory/memory.py 99 100 101 102 103 104 def load ( self , content : List [ MemoryItem ]) -> None : \"\"\" Load the data from the memory. :param content: The content to load. \"\"\" self . _content = content","title":"load"},{"location":"agents/design/memory/#agents.memory.memory.Memory.to_json","text":"Convert the memory to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 150 151 152 153 154 155 156 157 158 def to_json ( self ) -> str : \"\"\" Convert the memory to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( [ item . to_dict () for item in self . _content if item is not None ] ) Info Each agent has its own Memory instance to store their information. Info Not all information in the Memory are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.","title":"to_json"},{"location":"agents/design/processor/","text":"Agents Processor The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder. Core Process Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows: Step Description Function 1 Print the step information. print_step_info 2 Capture the screenshot of the application. capture_screenshot 3 Get the control information of the application. get_control_info 4 Get the prompt message for the LLM. get_prompt_message 5 Generate the response from the LLM. get_response 6 Update the cost of the step. update_cost 7 Parse the response from the LLM. parse_response 8 Execute the action based on the response. execute_action 9 Update the memory and blackboard. update_memory 10 Update the status of the agent. update_status At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions. The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method. Reference Below is the basic structure of the Processor class: Bases: ABC The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. At each round, the HostAgent and AppAgent interact with the user and the application with the processor. Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round. Initialize the processor. Parameters: context ( Context ) \u2013 The context of the session. agent ( BasicAgent ) \u2013 The agent who executes the processor. Source code in agents/processors/basic.py 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def __init__ ( self , agent : BasicAgent , context : Context ) -> None : \"\"\" Initialize the processor. :param context: The context of the session. :param agent: The agent who executes the processor. \"\"\" self . _context = context self . _agent = agent self . photographer = PhotographerFacade () self . control_inspector = ControlInspectorFacade ( BACKEND ) self . _prompt_message = None self . _status = None self . _response = None self . _cost = 0 self . _control_label = None self . _control_text = None self . _response_json = {} self . _memory_data = MemoryItem () self . _results = None self . _question_list = [] self . _agent_status_manager = self . agent . status_manager self . _is_resumed = False self . _action = None self . _plan = None self . _control_log = { \"control_class\" : None , \"control_type\" : None , \"control_automation_id\" : None , } self . _total_time_cost = 0 self . _time_cost = {} self . _exeception_traceback = {} action : str property writable Get the action. Returns: str \u2013 The action. agent : BasicAgent property Get the agent. Returns: BasicAgent \u2013 The agent. app_root : str property writable Get the application root. Returns: str \u2013 The application root. application_process_name : str property writable Get the application process name. Returns: str \u2013 The application process name. application_window : UIAWrapper property writable Get the active window. Returns: UIAWrapper \u2013 The active window. context : Context property Get the context. Returns: Context \u2013 The context. control_label : str property writable Get the control label. Returns: str \u2013 The control label. control_reannotate : List [ str ] property writable Get the control reannotation. Returns: List [ str ] \u2013 The control reannotation. control_text : str property writable Get the active application. Returns: str \u2013 The active application. cost : float property writable Get the cost of the processor. Returns: float \u2013 The cost of the processor. host_message : List [ str ] property writable Get the host message. Returns: List [ str ] \u2013 The host message. log_path : str property Get the log path. Returns: str \u2013 The log path. logger : str property Get the logger. Returns: str \u2013 The logger. name : str property Get the name of the processor. Returns: str \u2013 The name of the processor. plan : str property writable Get the plan of the agent. Returns: str \u2013 The plan. prev_plan : List [ str ] property Get the previous plan. Returns: List [ str ] \u2013 The previous plan of the agent. previous_subtasks : List [ str ] property writable Get the previous subtasks. Returns: List [ str ] \u2013 The previous subtasks. question_list : List [ str ] property writable Get the question list. Returns: List [ str ] \u2013 The question list. request : str property Get the request. Returns: str \u2013 The request. request_logger : str property Get the request logger. Returns: str \u2013 The request logger. round_cost : float property writable Get the round cost. Returns: float \u2013 The round cost. round_num : int property Get the round number. Returns: int \u2013 The round number. round_step : int property writable Get the round step. Returns: int \u2013 The round step. round_subtask_amount : int property Get the round subtask amount. Returns: int \u2013 The round subtask amount. session_cost : float property writable Get the session cost. Returns: float \u2013 The session cost. session_step : int property writable Get the session step. Returns: int \u2013 The session step. status : str property writable Get the status of the processor. Returns: str \u2013 The status of the processor. subtask : str property writable Get the subtask. Returns: str \u2013 The subtask. ui_tree_path : str property Get the UI tree path. Returns: str \u2013 The UI tree path. add_to_memory ( data_dict ) Add the data to the memory. Parameters: data_dict ( Dict [ str , Any ] ) \u2013 The data dictionary to be added to the memory. Source code in agents/processors/basic.py 297 298 299 300 301 302 def add_to_memory ( self , data_dict : Dict [ str , Any ]) -> None : \"\"\" Add the data to the memory. :param data_dict: The data dictionary to be added to the memory. \"\"\" self . _memory_data . add_values_from_dict ( data_dict ) capture_screenshot () abstractmethod Capture the screenshot. Source code in agents/processors/basic.py 235 236 237 238 239 240 @abstractmethod def capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" pass exception_capture ( func ) classmethod Decorator to capture the exception of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 @classmethod def exception_capture ( cls , func ): \"\"\" Decorator to capture the exception of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): try : func ( self , * args , ** kwargs ) except Exception as e : self . _exeception_traceback [ func . __name__ ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } utils . print_with_color ( f \"Error Occurs at { func . __name__ } \" , \"red\" ) utils . print_with_color ( self . _exeception_traceback [ func . __name__ ][ \"traceback\" ], \"red\" ) if self . _response is not None : utils . print_with_color ( \"Response: \" , \"red\" ) utils . print_with_color ( self . _response , \"red\" ) self . _status = self . _agent_status_manager . ERROR . value self . sync_memory () self . add_to_memory ({ \"error\" : self . _exeception_traceback }) self . add_to_memory ({ \"Status\" : self . _status }) self . log_save () raise StopIteration ( \"Error occurred during step.\" ) return wrapper execute_action () abstractmethod Execute the action. Source code in agents/processors/basic.py 270 271 272 273 274 275 @abstractmethod def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" pass get_control_info () abstractmethod Get the control information. Source code in agents/processors/basic.py 242 243 244 245 246 247 @abstractmethod def get_control_info ( self ) -> None : \"\"\" Get the control information. \"\"\" pass get_prompt_message () abstractmethod Get the prompt message. Source code in agents/processors/basic.py 249 250 251 252 253 254 @abstractmethod def get_prompt_message ( self ) -> None : \"\"\" Get the prompt message. \"\"\" pass get_response () abstractmethod Get the response from the LLM. Source code in agents/processors/basic.py 256 257 258 259 260 261 @abstractmethod def get_response ( self ) -> None : \"\"\" Get the response from the LLM. \"\"\" pass is_confirm () Check if the process is confirm. Returns: bool \u2013 The boolean value indicating if the process is confirm. Source code in agents/processors/basic.py 736 737 738 739 740 741 742 743 744 def is_confirm ( self ) -> bool : \"\"\" Check if the process is confirm. :return: The boolean value indicating if the process is confirm. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . CONFIRM . value is_error () Check if the process is in error. Returns: bool \u2013 The boolean value indicating if the process is in error. Source code in agents/processors/basic.py 704 705 706 707 708 709 710 711 def is_error ( self ) -> bool : \"\"\" Check if the process is in error. :return: The boolean value indicating if the process is in error. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . ERROR . value is_paused () Check if the process is paused. Returns: bool \u2013 The boolean value indicating if the process is paused. Source code in agents/processors/basic.py 713 714 715 716 717 718 719 720 721 722 723 724 def is_paused ( self ) -> bool : \"\"\" Check if the process is paused. :return: The boolean value indicating if the process is paused. \"\"\" self . agent . status = self . status return ( self . status == self . _agent_status_manager . PENDING . value or self . status == self . _agent_status_manager . CONFIRM . value ) is_pending () Check if the process is pending. Returns: bool \u2013 The boolean value indicating if the process is pending. Source code in agents/processors/basic.py 726 727 728 729 730 731 732 733 734 def is_pending ( self ) -> bool : \"\"\" Check if the process is pending. :return: The boolean value indicating if the process is pending. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . PENDING . value log ( response_json ) Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. Source code in agents/processors/basic.py 746 747 748 749 750 751 752 753 754 def log ( self , response_json : Dict [ str , Any ]) -> None : \"\"\" Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. \"\"\" self . logger . info ( json . dumps ( response_json )) log_save () Save the log. Source code in agents/processors/basic.py 304 305 306 307 308 309 310 311 312 def log_save ( self ) -> None : \"\"\" Save the log. \"\"\" self . _memory_data . add_values_from_dict ( { \"total_time_cost\" : self . _total_time_cost } ) self . log ( self . _memory_data . to_dict ()) method_timer ( func ) classmethod Decorator to calculate the time cost of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 @classmethod def method_timer ( cls , func ): \"\"\" Decorator to calculate the time cost of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): start_time = time . time () result = func ( self , * args , ** kwargs ) end_time = time . time () self . _time_cost [ func . __name__ ] = end_time - start_time return result return wrapper parse_response () abstractmethod Parse the response. Source code in agents/processors/basic.py 263 264 265 266 267 268 @abstractmethod def parse_response ( self ) -> None : \"\"\" Parse the response. \"\"\" pass print_step_info () abstractmethod Print the step information. Source code in agents/processors/basic.py 228 229 230 231 232 233 @abstractmethod def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" pass process () Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. Source code in agents/processors/basic.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def process ( self ) -> None : \"\"\" Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. \"\"\" start_time = time . time () try : # Step 1: Print the step information. self . print_step_info () # Step 2: Capture the screenshot. self . capture_screenshot () # Step 3: Get the control information. self . get_control_info () # Step 4: Get the prompt message. self . get_prompt_message () # Step 5: Get the response. self . get_response () # Step 6: Update the context. self . update_cost () # Step 7: Parse the response, if there is no error. self . parse_response () if self . is_pending () or self . is_paused (): # If the session is pending, update the step and memory, and return. if self . is_pending (): self . update_status () self . update_memory () return # Step 8: Execute the action. self . execute_action () # Step 9: Update the memory. self . update_memory () # Step 10: Update the status. self . update_status () self . _total_time_cost = time . time () - start_time # Step 11: Save the log. self . log_save () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. return resume () Resume the process of action execution after the session is paused. Source code in agents/processors/basic.py 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 def resume ( self ) -> None : \"\"\" Resume the process of action execution after the session is paused. \"\"\" self . _is_resumed = True try : # Step 1: Execute the action. self . execute_action () # Step 2: Update the memory. self . update_memory () # Step 3: Update the status. self . update_status () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. pass finally : self . _is_resumed = False string2list ( string ) staticmethod Convert a string to a list of string if the input is a string. Parameters: string ( Any ) \u2013 The string. Returns: List [ str ] \u2013 The list. Source code in agents/processors/basic.py 764 765 766 767 768 769 770 771 772 773 774 @staticmethod def string2list ( string : Any ) -> List [ str ]: \"\"\" Convert a string to a list of string if the input is a string. :param string: The string. :return: The list. \"\"\" if isinstance ( string , str ): return [ string ] else : return string sync_memory () abstractmethod Sync the memory of the Agent. Source code in agents/processors/basic.py 221 222 223 224 225 226 @abstractmethod def sync_memory ( self ) -> None : \"\"\" Sync the memory of the Agent. \"\"\" pass update_cost () Update the cost. Source code in agents/processors/basic.py 322 323 324 325 326 327 328 def update_cost ( self ) -> None : \"\"\" Update the cost. \"\"\" self . round_cost += self . cost self . session_cost += self . cost update_memory () abstractmethod Update the memory of the Agent. Source code in agents/processors/basic.py 277 278 279 280 281 282 @abstractmethod def update_memory ( self ) -> None : \"\"\" Update the memory of the Agent. \"\"\" pass update_status () Update the status of the session. Source code in agents/processors/basic.py 284 285 286 287 288 289 290 291 292 293 294 295 def update_status ( self ) -> None : \"\"\" Update the status of the session. \"\"\" self . agent . step += 1 self . agent . status = self . status if self . status != self . _agent_status_manager . FINISH . value : time . sleep ( configs [ \"SLEEP_TIME\" ]) self . round_step += 1 self . session_step += 1","title":"Processor"},{"location":"agents/design/processor/#agents-processor","text":"The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder.","title":"Agents Processor"},{"location":"agents/design/processor/#core-process","text":"Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows: Step Description Function 1 Print the step information. print_step_info 2 Capture the screenshot of the application. capture_screenshot 3 Get the control information of the application. get_control_info 4 Get the prompt message for the LLM. get_prompt_message 5 Generate the response from the LLM. get_response 6 Update the cost of the step. update_cost 7 Parse the response from the LLM. parse_response 8 Execute the action based on the response. execute_action 9 Update the memory and blackboard. update_memory 10 Update the status of the agent. update_status At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions. The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method.","title":"Core Process"},{"location":"agents/design/processor/#reference","text":"Below is the basic structure of the Processor class: Bases: ABC The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. At each round, the HostAgent and AppAgent interact with the user and the application with the processor. Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round. Initialize the processor. Parameters: context ( Context ) \u2013 The context of the session. agent ( BasicAgent ) \u2013 The agent who executes the processor. Source code in agents/processors/basic.py 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def __init__ ( self , agent : BasicAgent , context : Context ) -> None : \"\"\" Initialize the processor. :param context: The context of the session. :param agent: The agent who executes the processor. \"\"\" self . _context = context self . _agent = agent self . photographer = PhotographerFacade () self . control_inspector = ControlInspectorFacade ( BACKEND ) self . _prompt_message = None self . _status = None self . _response = None self . _cost = 0 self . _control_label = None self . _control_text = None self . _response_json = {} self . _memory_data = MemoryItem () self . _results = None self . _question_list = [] self . _agent_status_manager = self . agent . status_manager self . _is_resumed = False self . _action = None self . _plan = None self . _control_log = { \"control_class\" : None , \"control_type\" : None , \"control_automation_id\" : None , } self . _total_time_cost = 0 self . _time_cost = {} self . _exeception_traceback = {}","title":"Reference"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.action","text":"Get the action. Returns: str \u2013 The action.","title":"action"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.agent","text":"Get the agent. Returns: BasicAgent \u2013 The agent.","title":"agent"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.app_root","text":"Get the application root. Returns: str \u2013 The application root.","title":"app_root"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.application_process_name","text":"Get the application process name. Returns: str \u2013 The application process name.","title":"application_process_name"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.application_window","text":"Get the active window. Returns: UIAWrapper \u2013 The active window.","title":"application_window"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.context","text":"Get the context. Returns: Context \u2013 The context.","title":"context"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_label","text":"Get the control label. Returns: str \u2013 The control label.","title":"control_label"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_reannotate","text":"Get the control reannotation. Returns: List [ str ] \u2013 The control reannotation.","title":"control_reannotate"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_text","text":"Get the active application. Returns: str \u2013 The active application.","title":"control_text"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.cost","text":"Get the cost of the processor. Returns: float \u2013 The cost of the processor.","title":"cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.host_message","text":"Get the host message. Returns: List [ str ] \u2013 The host message.","title":"host_message"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log_path","text":"Get the log path. Returns: str \u2013 The log path.","title":"log_path"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.logger","text":"Get the logger. Returns: str \u2013 The logger.","title":"logger"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.name","text":"Get the name of the processor. Returns: str \u2013 The name of the processor.","title":"name"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.plan","text":"Get the plan of the agent. Returns: str \u2013 The plan.","title":"plan"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.prev_plan","text":"Get the previous plan. Returns: List [ str ] \u2013 The previous plan of the agent.","title":"prev_plan"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.previous_subtasks","text":"Get the previous subtasks. Returns: List [ str ] \u2013 The previous subtasks.","title":"previous_subtasks"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.question_list","text":"Get the question list. Returns: List [ str ] \u2013 The question list.","title":"question_list"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.request","text":"Get the request. Returns: str \u2013 The request.","title":"request"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.request_logger","text":"Get the request logger. Returns: str \u2013 The request logger.","title":"request_logger"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_cost","text":"Get the round cost. Returns: float \u2013 The round cost.","title":"round_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_num","text":"Get the round number. Returns: int \u2013 The round number.","title":"round_num"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_step","text":"Get the round step. Returns: int \u2013 The round step.","title":"round_step"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_subtask_amount","text":"Get the round subtask amount. Returns: int \u2013 The round subtask amount.","title":"round_subtask_amount"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.session_cost","text":"Get the session cost. Returns: float \u2013 The session cost.","title":"session_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.session_step","text":"Get the session step. Returns: int \u2013 The session step.","title":"session_step"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.status","text":"Get the status of the processor. Returns: str \u2013 The status of the processor.","title":"status"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.subtask","text":"Get the subtask. Returns: str \u2013 The subtask.","title":"subtask"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.ui_tree_path","text":"Get the UI tree path. Returns: str \u2013 The UI tree path.","title":"ui_tree_path"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.add_to_memory","text":"Add the data to the memory. Parameters: data_dict ( Dict [ str , Any ] ) \u2013 The data dictionary to be added to the memory. Source code in agents/processors/basic.py 297 298 299 300 301 302 def add_to_memory ( self , data_dict : Dict [ str , Any ]) -> None : \"\"\" Add the data to the memory. :param data_dict: The data dictionary to be added to the memory. \"\"\" self . _memory_data . add_values_from_dict ( data_dict )","title":"add_to_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.capture_screenshot","text":"Capture the screenshot. Source code in agents/processors/basic.py 235 236 237 238 239 240 @abstractmethod def capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" pass","title":"capture_screenshot"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.exception_capture","text":"Decorator to capture the exception of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 @classmethod def exception_capture ( cls , func ): \"\"\" Decorator to capture the exception of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): try : func ( self , * args , ** kwargs ) except Exception as e : self . _exeception_traceback [ func . __name__ ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } utils . print_with_color ( f \"Error Occurs at { func . __name__ } \" , \"red\" ) utils . print_with_color ( self . _exeception_traceback [ func . __name__ ][ \"traceback\" ], \"red\" ) if self . _response is not None : utils . print_with_color ( \"Response: \" , \"red\" ) utils . print_with_color ( self . _response , \"red\" ) self . _status = self . _agent_status_manager . ERROR . value self . sync_memory () self . add_to_memory ({ \"error\" : self . _exeception_traceback }) self . add_to_memory ({ \"Status\" : self . _status }) self . log_save () raise StopIteration ( \"Error occurred during step.\" ) return wrapper","title":"exception_capture"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.execute_action","text":"Execute the action. Source code in agents/processors/basic.py 270 271 272 273 274 275 @abstractmethod def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" pass","title":"execute_action"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_control_info","text":"Get the control information. Source code in agents/processors/basic.py 242 243 244 245 246 247 @abstractmethod def get_control_info ( self ) -> None : \"\"\" Get the control information. \"\"\" pass","title":"get_control_info"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_prompt_message","text":"Get the prompt message. Source code in agents/processors/basic.py 249 250 251 252 253 254 @abstractmethod def get_prompt_message ( self ) -> None : \"\"\" Get the prompt message. \"\"\" pass","title":"get_prompt_message"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_response","text":"Get the response from the LLM. Source code in agents/processors/basic.py 256 257 258 259 260 261 @abstractmethod def get_response ( self ) -> None : \"\"\" Get the response from the LLM. \"\"\" pass","title":"get_response"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_confirm","text":"Check if the process is confirm. Returns: bool \u2013 The boolean value indicating if the process is confirm. Source code in agents/processors/basic.py 736 737 738 739 740 741 742 743 744 def is_confirm ( self ) -> bool : \"\"\" Check if the process is confirm. :return: The boolean value indicating if the process is confirm. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . CONFIRM . value","title":"is_confirm"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_error","text":"Check if the process is in error. Returns: bool \u2013 The boolean value indicating if the process is in error. Source code in agents/processors/basic.py 704 705 706 707 708 709 710 711 def is_error ( self ) -> bool : \"\"\" Check if the process is in error. :return: The boolean value indicating if the process is in error. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . ERROR . value","title":"is_error"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_paused","text":"Check if the process is paused. Returns: bool \u2013 The boolean value indicating if the process is paused. Source code in agents/processors/basic.py 713 714 715 716 717 718 719 720 721 722 723 724 def is_paused ( self ) -> bool : \"\"\" Check if the process is paused. :return: The boolean value indicating if the process is paused. \"\"\" self . agent . status = self . status return ( self . status == self . _agent_status_manager . PENDING . value or self . status == self . _agent_status_manager . CONFIRM . value )","title":"is_paused"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_pending","text":"Check if the process is pending. Returns: bool \u2013 The boolean value indicating if the process is pending. Source code in agents/processors/basic.py 726 727 728 729 730 731 732 733 734 def is_pending ( self ) -> bool : \"\"\" Check if the process is pending. :return: The boolean value indicating if the process is pending. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . PENDING . value","title":"is_pending"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log","text":"Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. Source code in agents/processors/basic.py 746 747 748 749 750 751 752 753 754 def log ( self , response_json : Dict [ str , Any ]) -> None : \"\"\" Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. \"\"\" self . logger . info ( json . dumps ( response_json ))","title":"log"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log_save","text":"Save the log. Source code in agents/processors/basic.py 304 305 306 307 308 309 310 311 312 def log_save ( self ) -> None : \"\"\" Save the log. \"\"\" self . _memory_data . add_values_from_dict ( { \"total_time_cost\" : self . _total_time_cost } ) self . log ( self . _memory_data . to_dict ())","title":"log_save"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.method_timer","text":"Decorator to calculate the time cost of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 @classmethod def method_timer ( cls , func ): \"\"\" Decorator to calculate the time cost of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): start_time = time . time () result = func ( self , * args , ** kwargs ) end_time = time . time () self . _time_cost [ func . __name__ ] = end_time - start_time return result return wrapper","title":"method_timer"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.parse_response","text":"Parse the response. Source code in agents/processors/basic.py 263 264 265 266 267 268 @abstractmethod def parse_response ( self ) -> None : \"\"\" Parse the response. \"\"\" pass","title":"parse_response"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.print_step_info","text":"Print the step information. Source code in agents/processors/basic.py 228 229 230 231 232 233 @abstractmethod def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" pass","title":"print_step_info"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.process","text":"Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. Source code in agents/processors/basic.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def process ( self ) -> None : \"\"\" Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. \"\"\" start_time = time . time () try : # Step 1: Print the step information. self . print_step_info () # Step 2: Capture the screenshot. self . capture_screenshot () # Step 3: Get the control information. self . get_control_info () # Step 4: Get the prompt message. self . get_prompt_message () # Step 5: Get the response. self . get_response () # Step 6: Update the context. self . update_cost () # Step 7: Parse the response, if there is no error. self . parse_response () if self . is_pending () or self . is_paused (): # If the session is pending, update the step and memory, and return. if self . is_pending (): self . update_status () self . update_memory () return # Step 8: Execute the action. self . execute_action () # Step 9: Update the memory. self . update_memory () # Step 10: Update the status. self . update_status () self . _total_time_cost = time . time () - start_time # Step 11: Save the log. self . log_save () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. return","title":"process"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.resume","text":"Resume the process of action execution after the session is paused. Source code in agents/processors/basic.py 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 def resume ( self ) -> None : \"\"\" Resume the process of action execution after the session is paused. \"\"\" self . _is_resumed = True try : # Step 1: Execute the action. self . execute_action () # Step 2: Update the memory. self . update_memory () # Step 3: Update the status. self . update_status () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. pass finally : self . _is_resumed = False","title":"resume"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.string2list","text":"Convert a string to a list of string if the input is a string. Parameters: string ( Any ) \u2013 The string. Returns: List [ str ] \u2013 The list. Source code in agents/processors/basic.py 764 765 766 767 768 769 770 771 772 773 774 @staticmethod def string2list ( string : Any ) -> List [ str ]: \"\"\" Convert a string to a list of string if the input is a string. :param string: The string. :return: The list. \"\"\" if isinstance ( string , str ): return [ string ] else : return string","title":"string2list"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.sync_memory","text":"Sync the memory of the Agent. Source code in agents/processors/basic.py 221 222 223 224 225 226 @abstractmethod def sync_memory ( self ) -> None : \"\"\" Sync the memory of the Agent. \"\"\" pass","title":"sync_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_cost","text":"Update the cost. Source code in agents/processors/basic.py 322 323 324 325 326 327 328 def update_cost ( self ) -> None : \"\"\" Update the cost. \"\"\" self . round_cost += self . cost self . session_cost += self . cost","title":"update_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_memory","text":"Update the memory of the Agent. Source code in agents/processors/basic.py 277 278 279 280 281 282 @abstractmethod def update_memory ( self ) -> None : \"\"\" Update the memory of the Agent. \"\"\" pass","title":"update_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_status","text":"Update the status of the session. Source code in agents/processors/basic.py 284 285 286 287 288 289 290 291 292 293 294 295 def update_status ( self ) -> None : \"\"\" Update the status of the session. \"\"\" self . agent . step += 1 self . agent . status = self . status if self . status != self . _agent_status_manager . FINISH . value : time . sleep ( configs [ \"SLEEP_TIME\" ]) self . round_step += 1 self . session_step += 1","title":"update_status"},{"location":"agents/design/prompter/","text":"Agent Prompter The Prompter is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter is implemented in the ufo/prompts folder. Each agent has its own Prompter class that defines the structure of the prompt and the information to be fed to the LLM. Components A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys: Key Description role The role of the text in the prompt, can be system , user , or assistant . content The content of the text for the specific role. Tip You may find the official documentation helpful for constructing the prompt. In the __init__ method of the Prompter class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction method. System Prompt The system prompt use the template configured in the config_dev.yaml file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc. You need use the system_prompt_construction method to construct the system prompt. Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper and examples_prompt_helper methods respectively. Below is the sub-components of the system prompt: Component Description Method apis The API instructions for the agent. api_prompt_helper examples The demonstration examples for the agent. examples_prompt_helper User Prompt The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard . You can use the user_prompt_construction method to construct the user prompt. Below is the sub-components of the user prompt: Component Description Method observation The observation of the agent. user_content_construction retrieved_docs The knowledge retrieved from the external knowledge base. retrived_documents_prompt_helper blackboard The information stored in the Blackboard . blackboard_to_prompt Reference You can find the implementation of the Prompter in the ufo/prompts folder. Below is the basic structure of the Prompter class: Bases: ABC The BasicPrompter class is the abstract class for the prompter. Initialize the BasicPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. Source code in prompter/basic.py 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str ): \"\"\" Initialize the BasicPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. \"\"\" self . is_visual = is_visual if prompt_template : self . prompt_template = self . load_prompt_template ( prompt_template , is_visual ) else : self . prompt_template = \"\" if example_prompt_template : self . example_prompt_template = self . load_prompt_template ( example_prompt_template , is_visual ) else : self . example_prompt_template = \"\" api_prompt_helper () A helper function to construct the API list and descriptions for the prompt. Source code in prompter/basic.py 139 140 141 142 143 144 def api_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the API list and descriptions for the prompt. \"\"\" pass examples_prompt_helper () A helper function to construct the examples prompt for in-context learning. Source code in prompter/basic.py 132 133 134 135 136 137 def examples_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the examples prompt for in-context learning. \"\"\" pass load_prompt_template ( template_path , is_visual = None ) staticmethod Load the prompt template. Returns: Dict [ str , str ] \u2013 The prompt template. Source code in prompter/basic.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 @staticmethod def load_prompt_template ( template_path : str , is_visual = None ) -> Dict [ str , str ]: \"\"\" Load the prompt template. :return: The prompt template. \"\"\" if is_visual == None : path = template_path else : path = template_path . format ( mode = \"visual\" if is_visual == True else \"nonvisual\" ) if not path : return {} if os . path . exists ( path ): try : prompt = yaml . safe_load ( open ( path , \"r\" , encoding = \"utf-8\" )) except yaml . YAMLError as exc : print_with_color ( f \"Error loading prompt template: { exc } \" , \"yellow\" ) else : raise FileNotFoundError ( f \"Prompt template not found at { path } \" ) return prompt prompt_construction ( system_prompt , user_content ) staticmethod Construct the prompt for summarizing the experience into an example. Parameters: user_content ( List [ Dict [ str , str ]] ) \u2013 The user content. return: The prompt for summarizing the experience into an example. Source code in prompter/basic.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 @staticmethod def prompt_construction ( system_prompt : str , user_content : List [ Dict [ str , str ]] ) -> List : \"\"\" Construct the prompt for summarizing the experience into an example. :param user_content: The user content. return: The prompt for summarizing the experience into an example. \"\"\" system_message = { \"role\" : \"system\" , \"content\" : system_prompt } user_message = { \"role\" : \"user\" , \"content\" : user_content } prompt_message = [ system_message , user_message ] return prompt_message retrived_documents_prompt_helper ( header , separator , documents ) staticmethod Construct the prompt for retrieved documents. Parameters: header ( str ) \u2013 The header of the prompt. separator ( str ) \u2013 The separator of the prompt. documents ( List [ str ] ) \u2013 The retrieved documents. return: The prompt for retrieved documents. Source code in prompter/basic.py 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 @staticmethod def retrived_documents_prompt_helper ( header : str , separator : str , documents : List [ str ] ) -> str : \"\"\" Construct the prompt for retrieved documents. :param header: The header of the prompt. :param separator: The separator of the prompt. :param documents: The retrieved documents. return: The prompt for retrieved documents. \"\"\" if header : prompt = \" \\n < {header} :> \\n \" . format ( header = header ) else : prompt = \"\" for i , document in enumerate ( documents ): if separator : prompt += \"[ {separator} {i} :]\" . format ( separator = separator , i = i + 1 ) prompt += \" \\n \" prompt += document prompt += \" \\n\\n \" return prompt system_prompt_construction () abstractmethod Construct the system prompt for LLM. Source code in prompter/basic.py 108 109 110 111 112 113 114 @abstractmethod def system_prompt_construction ( self ) -> str : \"\"\" Construct the system prompt for LLM. \"\"\" pass user_content_construction () abstractmethod Construct the full user content for LLM, including the user prompt and images. Source code in prompter/basic.py 124 125 126 127 128 129 130 @abstractmethod def user_content_construction ( self ) -> str : \"\"\" Construct the full user content for LLM, including the user prompt and images. \"\"\" pass user_prompt_construction () abstractmethod Construct the textual user prompt for LLM based on the user field in the prompt template. Source code in prompter/basic.py 116 117 118 119 120 121 122 @abstractmethod def user_prompt_construction ( self ) -> str : \"\"\" Construct the textual user prompt for LLM based on the `user` field in the prompt template. \"\"\" pass Tip You can customize the Prompter class to tailor the prompt to your requirements.","title":"Prompter"},{"location":"agents/design/prompter/#agent-prompter","text":"The Prompter is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter is implemented in the ufo/prompts folder. Each agent has its own Prompter class that defines the structure of the prompt and the information to be fed to the LLM.","title":"Agent Prompter"},{"location":"agents/design/prompter/#components","text":"A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys: Key Description role The role of the text in the prompt, can be system , user , or assistant . content The content of the text for the specific role. Tip You may find the official documentation helpful for constructing the prompt. In the __init__ method of the Prompter class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction method.","title":"Components"},{"location":"agents/design/prompter/#system-prompt","text":"The system prompt use the template configured in the config_dev.yaml file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc. You need use the system_prompt_construction method to construct the system prompt. Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper and examples_prompt_helper methods respectively. Below is the sub-components of the system prompt: Component Description Method apis The API instructions for the agent. api_prompt_helper examples The demonstration examples for the agent. examples_prompt_helper","title":"System Prompt"},{"location":"agents/design/prompter/#user-prompt","text":"The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard . You can use the user_prompt_construction method to construct the user prompt. Below is the sub-components of the user prompt: Component Description Method observation The observation of the agent. user_content_construction retrieved_docs The knowledge retrieved from the external knowledge base. retrived_documents_prompt_helper blackboard The information stored in the Blackboard . blackboard_to_prompt","title":"User Prompt"},{"location":"agents/design/prompter/#reference","text":"You can find the implementation of the Prompter in the ufo/prompts folder. Below is the basic structure of the Prompter class: Bases: ABC The BasicPrompter class is the abstract class for the prompter. Initialize the BasicPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. Source code in prompter/basic.py 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str ): \"\"\" Initialize the BasicPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. \"\"\" self . is_visual = is_visual if prompt_template : self . prompt_template = self . load_prompt_template ( prompt_template , is_visual ) else : self . prompt_template = \"\" if example_prompt_template : self . example_prompt_template = self . load_prompt_template ( example_prompt_template , is_visual ) else : self . example_prompt_template = \"\"","title":"Reference"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.api_prompt_helper","text":"A helper function to construct the API list and descriptions for the prompt. Source code in prompter/basic.py 139 140 141 142 143 144 def api_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the API list and descriptions for the prompt. \"\"\" pass","title":"api_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.examples_prompt_helper","text":"A helper function to construct the examples prompt for in-context learning. Source code in prompter/basic.py 132 133 134 135 136 137 def examples_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the examples prompt for in-context learning. \"\"\" pass","title":"examples_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.load_prompt_template","text":"Load the prompt template. Returns: Dict [ str , str ] \u2013 The prompt template. Source code in prompter/basic.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 @staticmethod def load_prompt_template ( template_path : str , is_visual = None ) -> Dict [ str , str ]: \"\"\" Load the prompt template. :return: The prompt template. \"\"\" if is_visual == None : path = template_path else : path = template_path . format ( mode = \"visual\" if is_visual == True else \"nonvisual\" ) if not path : return {} if os . path . exists ( path ): try : prompt = yaml . safe_load ( open ( path , \"r\" , encoding = \"utf-8\" )) except yaml . YAMLError as exc : print_with_color ( f \"Error loading prompt template: { exc } \" , \"yellow\" ) else : raise FileNotFoundError ( f \"Prompt template not found at { path } \" ) return prompt","title":"load_prompt_template"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.prompt_construction","text":"Construct the prompt for summarizing the experience into an example. Parameters: user_content ( List [ Dict [ str , str ]] ) \u2013 The user content. return: The prompt for summarizing the experience into an example. Source code in prompter/basic.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 @staticmethod def prompt_construction ( system_prompt : str , user_content : List [ Dict [ str , str ]] ) -> List : \"\"\" Construct the prompt for summarizing the experience into an example. :param user_content: The user content. return: The prompt for summarizing the experience into an example. \"\"\" system_message = { \"role\" : \"system\" , \"content\" : system_prompt } user_message = { \"role\" : \"user\" , \"content\" : user_content } prompt_message = [ system_message , user_message ] return prompt_message","title":"prompt_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.retrived_documents_prompt_helper","text":"Construct the prompt for retrieved documents. Parameters: header ( str ) \u2013 The header of the prompt. separator ( str ) \u2013 The separator of the prompt. documents ( List [ str ] ) \u2013 The retrieved documents. return: The prompt for retrieved documents. Source code in prompter/basic.py 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 @staticmethod def retrived_documents_prompt_helper ( header : str , separator : str , documents : List [ str ] ) -> str : \"\"\" Construct the prompt for retrieved documents. :param header: The header of the prompt. :param separator: The separator of the prompt. :param documents: The retrieved documents. return: The prompt for retrieved documents. \"\"\" if header : prompt = \" \\n < {header} :> \\n \" . format ( header = header ) else : prompt = \"\" for i , document in enumerate ( documents ): if separator : prompt += \"[ {separator} {i} :]\" . format ( separator = separator , i = i + 1 ) prompt += \" \\n \" prompt += document prompt += \" \\n\\n \" return prompt","title":"retrived_documents_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.system_prompt_construction","text":"Construct the system prompt for LLM. Source code in prompter/basic.py 108 109 110 111 112 113 114 @abstractmethod def system_prompt_construction ( self ) -> str : \"\"\" Construct the system prompt for LLM. \"\"\" pass","title":"system_prompt_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.user_content_construction","text":"Construct the full user content for LLM, including the user prompt and images. Source code in prompter/basic.py 124 125 126 127 128 129 130 @abstractmethod def user_content_construction ( self ) -> str : \"\"\" Construct the full user content for LLM, including the user prompt and images. \"\"\" pass","title":"user_content_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.user_prompt_construction","text":"Construct the textual user prompt for LLM based on the user field in the prompt template. Source code in prompter/basic.py 116 117 118 119 120 121 122 @abstractmethod def user_prompt_construction ( self ) -> str : \"\"\" Construct the textual user prompt for LLM based on the `user` field in the prompt template. \"\"\" pass Tip You can customize the Prompter class to tailor the prompt to your requirements.","title":"user_prompt_construction"},{"location":"agents/design/state/","text":"Agent State The State class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow. AgentStatus The set of states for an agent is defined in the AgentStatus class: class AgentStatus(Enum): \"\"\" The status class for the agent. \"\"\" ERROR = \"ERROR\" FINISH = \"FINISH\" CONTINUE = \"CONTINUE\" FAIL = \"FAIL\" PENDING = \"PENDING\" CONFIRM = \"CONFIRM\" SCREENSHOT = \"SCREENSHOT\" Each agent implements its own set of AgentStatus to define the states of the agent. AgentStateManager The class AgentStateManager manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager using the register decorator to associate the state class with a specific agent, e.g., @AgentStateManager.register class SomeAgentState(AgentState): \"\"\" The state class for the some agent. \"\"\" Tip You can find examples on how to register the state class for the AppAgent in the ufo/agents/states/app_agent_state.py file. Below is the basic structure of the AgentStateManager class: class AgentStateManager(ABC, metaclass=SingletonABCMeta): \"\"\" A abstract class to manage the states of the agent. \"\"\" _state_mapping: Dict[str, Type[AgentState]] = {} def __init__(self): \"\"\" Initialize the state manager. \"\"\" self._state_instance_mapping: Dict[str, AgentState] = {} def get_state(self, status: str) -> AgentState: \"\"\" Get the state for the status. :param status: The status string. :return: The state object. \"\"\" # Lazy load the state class if status not in self._state_instance_mapping: state_class = self._state_mapping.get(status) if state_class: self._state_instance_mapping[status] = state_class() else: self._state_instance_mapping[status] = self.none_state state = self._state_instance_mapping.get(status, self.none_state) return state def add_state(self, status: str, state: AgentState) -> None: \"\"\" Add a new state to the state mapping. :param status: The status string. :param state: The state object. \"\"\" self.state_map[status] = state @property def state_map(self) -> Dict[str, AgentState]: \"\"\" The state mapping of status to state. :return: The state mapping. \"\"\" return self._state_instance_mapping @classmethod def register(cls, state_class: Type[AgentState]) -> Type[AgentState]: \"\"\" Decorator to register the state class to the state manager. :param state_class: The state class to be registered. :return: The state class. \"\"\" cls._state_mapping[state_class.name()] = state_class return state_class @property @abstractmethod def none_state(self) -> AgentState: \"\"\" The none state of the state manager. \"\"\" pass AgentState Each state class inherits from the AgentState class and must implement the method of handle to process the action in the state. In addition, the next_state and next_agent methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State class in UFO. Bases: ABC The abstract class for the agent state. agent_class () abstractmethod classmethod The class of the agent. Returns: Type [ BasicAgent ] \u2013 The class of the agent. Source code in agents/states/basic.py 165 166 167 168 169 170 171 172 @classmethod @abstractmethod def agent_class ( cls ) -> Type [ BasicAgent ]: \"\"\" The class of the agent. :return: The class of the agent. \"\"\" pass handle ( agent , context = None ) abstractmethod Handle the agent for the current step. Parameters: agent ( BasicAgent ) \u2013 The agent to handle. context ( Optional ['Context'] , default: None ) \u2013 The context for the agent and session. Source code in agents/states/basic.py 122 123 124 125 126 127 128 129 @abstractmethod def handle ( self , agent : BasicAgent , context : Optional [ \"Context\" ] = None ) -> None : \"\"\" Handle the agent for the current step. :param agent: The agent to handle. :param context: The context for the agent and session. \"\"\" pass is_round_end () abstractmethod Check if the round ends. Returns: bool \u2013 True if the round ends, False otherwise. Source code in agents/states/basic.py 149 150 151 152 153 154 155 @abstractmethod def is_round_end ( self ) -> bool : \"\"\" Check if the round ends. :return: True if the round ends, False otherwise. \"\"\" pass is_subtask_end () abstractmethod Check if the subtask ends. Returns: bool \u2013 True if the subtask ends, False otherwise. Source code in agents/states/basic.py 157 158 159 160 161 162 163 @abstractmethod def is_subtask_end ( self ) -> bool : \"\"\" Check if the subtask ends. :return: True if the subtask ends, False otherwise. \"\"\" pass name () abstractmethod classmethod The class name of the state. Returns: str \u2013 The class name of the state. Source code in agents/states/basic.py 174 175 176 177 178 179 180 181 @classmethod @abstractmethod def name ( cls ) -> str : \"\"\" The class name of the state. :return: The class name of the state. \"\"\" return \"\" next_agent ( agent ) abstractmethod Get the agent for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: BasicAgent \u2013 The agent for the next step. Source code in agents/states/basic.py 131 132 133 134 135 136 137 138 @abstractmethod def next_agent ( self , agent : BasicAgent ) -> BasicAgent : \"\"\" Get the agent for the next step. :param agent: The agent for the current step. :return: The agent for the next step. \"\"\" return agent next_state ( agent ) abstractmethod Get the state for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: AgentState \u2013 The state for the next step. Source code in agents/states/basic.py 140 141 142 143 144 145 146 147 @abstractmethod def next_state ( self , agent : BasicAgent ) -> AgentState : \"\"\" Get the state for the next step. :param agent: The agent for the current step. :return: The state for the next step. \"\"\" pass Tip The state machine diagrams for the HostAgent and AppAgent are shown in their respective documents. Tip A Round calls the handle , next_state , and next_agent methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.","title":"State"},{"location":"agents/design/state/#agent-state","text":"The State class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow.","title":"Agent State"},{"location":"agents/design/state/#agentstatus","text":"The set of states for an agent is defined in the AgentStatus class: class AgentStatus(Enum): \"\"\" The status class for the agent. \"\"\" ERROR = \"ERROR\" FINISH = \"FINISH\" CONTINUE = \"CONTINUE\" FAIL = \"FAIL\" PENDING = \"PENDING\" CONFIRM = \"CONFIRM\" SCREENSHOT = \"SCREENSHOT\" Each agent implements its own set of AgentStatus to define the states of the agent.","title":"AgentStatus"},{"location":"agents/design/state/#agentstatemanager","text":"The class AgentStateManager manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager using the register decorator to associate the state class with a specific agent, e.g., @AgentStateManager.register class SomeAgentState(AgentState): \"\"\" The state class for the some agent. \"\"\" Tip You can find examples on how to register the state class for the AppAgent in the ufo/agents/states/app_agent_state.py file. Below is the basic structure of the AgentStateManager class: class AgentStateManager(ABC, metaclass=SingletonABCMeta): \"\"\" A abstract class to manage the states of the agent. \"\"\" _state_mapping: Dict[str, Type[AgentState]] = {} def __init__(self): \"\"\" Initialize the state manager. \"\"\" self._state_instance_mapping: Dict[str, AgentState] = {} def get_state(self, status: str) -> AgentState: \"\"\" Get the state for the status. :param status: The status string. :return: The state object. \"\"\" # Lazy load the state class if status not in self._state_instance_mapping: state_class = self._state_mapping.get(status) if state_class: self._state_instance_mapping[status] = state_class() else: self._state_instance_mapping[status] = self.none_state state = self._state_instance_mapping.get(status, self.none_state) return state def add_state(self, status: str, state: AgentState) -> None: \"\"\" Add a new state to the state mapping. :param status: The status string. :param state: The state object. \"\"\" self.state_map[status] = state @property def state_map(self) -> Dict[str, AgentState]: \"\"\" The state mapping of status to state. :return: The state mapping. \"\"\" return self._state_instance_mapping @classmethod def register(cls, state_class: Type[AgentState]) -> Type[AgentState]: \"\"\" Decorator to register the state class to the state manager. :param state_class: The state class to be registered. :return: The state class. \"\"\" cls._state_mapping[state_class.name()] = state_class return state_class @property @abstractmethod def none_state(self) -> AgentState: \"\"\" The none state of the state manager. \"\"\" pass","title":"AgentStateManager"},{"location":"agents/design/state/#agentstate","text":"Each state class inherits from the AgentState class and must implement the method of handle to process the action in the state. In addition, the next_state and next_agent methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State class in UFO. Bases: ABC The abstract class for the agent state.","title":"AgentState"},{"location":"agents/design/state/#agents.states.basic.AgentState.agent_class","text":"The class of the agent. Returns: Type [ BasicAgent ] \u2013 The class of the agent. Source code in agents/states/basic.py 165 166 167 168 169 170 171 172 @classmethod @abstractmethod def agent_class ( cls ) -> Type [ BasicAgent ]: \"\"\" The class of the agent. :return: The class of the agent. \"\"\" pass","title":"agent_class"},{"location":"agents/design/state/#agents.states.basic.AgentState.handle","text":"Handle the agent for the current step. Parameters: agent ( BasicAgent ) \u2013 The agent to handle. context ( Optional ['Context'] , default: None ) \u2013 The context for the agent and session. Source code in agents/states/basic.py 122 123 124 125 126 127 128 129 @abstractmethod def handle ( self , agent : BasicAgent , context : Optional [ \"Context\" ] = None ) -> None : \"\"\" Handle the agent for the current step. :param agent: The agent to handle. :param context: The context for the agent and session. \"\"\" pass","title":"handle"},{"location":"agents/design/state/#agents.states.basic.AgentState.is_round_end","text":"Check if the round ends. Returns: bool \u2013 True if the round ends, False otherwise. Source code in agents/states/basic.py 149 150 151 152 153 154 155 @abstractmethod def is_round_end ( self ) -> bool : \"\"\" Check if the round ends. :return: True if the round ends, False otherwise. \"\"\" pass","title":"is_round_end"},{"location":"agents/design/state/#agents.states.basic.AgentState.is_subtask_end","text":"Check if the subtask ends. Returns: bool \u2013 True if the subtask ends, False otherwise. Source code in agents/states/basic.py 157 158 159 160 161 162 163 @abstractmethod def is_subtask_end ( self ) -> bool : \"\"\" Check if the subtask ends. :return: True if the subtask ends, False otherwise. \"\"\" pass","title":"is_subtask_end"},{"location":"agents/design/state/#agents.states.basic.AgentState.name","text":"The class name of the state. Returns: str \u2013 The class name of the state. Source code in agents/states/basic.py 174 175 176 177 178 179 180 181 @classmethod @abstractmethod def name ( cls ) -> str : \"\"\" The class name of the state. :return: The class name of the state. \"\"\" return \"\"","title":"name"},{"location":"agents/design/state/#agents.states.basic.AgentState.next_agent","text":"Get the agent for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: BasicAgent \u2013 The agent for the next step. Source code in agents/states/basic.py 131 132 133 134 135 136 137 138 @abstractmethod def next_agent ( self , agent : BasicAgent ) -> BasicAgent : \"\"\" Get the agent for the next step. :param agent: The agent for the current step. :return: The agent for the next step. \"\"\" return agent","title":"next_agent"},{"location":"agents/design/state/#agents.states.basic.AgentState.next_state","text":"Get the state for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: AgentState \u2013 The state for the next step. Source code in agents/states/basic.py 140 141 142 143 144 145 146 147 @abstractmethod def next_state ( self , agent : BasicAgent ) -> AgentState : \"\"\" Get the state for the next step. :param agent: The agent for the current step. :return: The state for the next step. \"\"\" pass Tip The state machine diagrams for the HostAgent and AppAgent are shown in their respective documents. Tip A Round calls the handle , next_state , and next_agent methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.","title":"next_state"},{"location":"automator/ai_tool_automator/","text":"AI Tool Automator The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks. Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application. Configuration The AI Tool Automator shares the same prompt configuration options as the UI Automator: Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" Receiver The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details. Command The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below: Command Name Function Name Description AnnotationCommand annotation Annotate the control items on the screenshot. SummaryCommand summary Summarize the observation of the current application window.","title":"AI Tool"},{"location":"automator/ai_tool_automator/#ai-tool-automator","text":"The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks. Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application.","title":"AI Tool Automator"},{"location":"automator/ai_tool_automator/#configuration","text":"The AI Tool Automator shares the same prompt configuration options as the UI Automator: Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\"","title":"Configuration"},{"location":"automator/ai_tool_automator/#receiver","text":"The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details.","title":"Receiver"},{"location":"automator/ai_tool_automator/#command","text":"The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below: Command Name Function Name Description AnnotationCommand annotation Annotate the control items on the screenshot. SummaryCommand summary Summarize the observation of the current application window.","title":"Command"},{"location":"automator/bash_automator/","text":"Bash Automator UFO allows the HostAgent to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator is implemented in the ufo/automator/app_apis/shell module. Note Only HostAgent is currently supported by the Bash Automator. Receiver The Web Automator receiver is the ShellReceiver class defined in the ufo/automator/app_apis/shell/shell_client.py file. Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the shell client. Source code in automator/app_apis/shell/shell_client.py 19 20 21 22 def __init__ ( self ) -> None : \"\"\" Initialize the shell client. \"\"\" run_shell ( params ) Run the command. Parameters: params ( Dict [ str , Any ] ) \u2013 The parameters of the command. Returns: Any \u2013 The result content. Source code in automator/app_apis/shell/shell_client.py 24 25 26 27 28 29 30 31 32 33 34 def run_shell ( self , params : Dict [ str , Any ]) -> Any : \"\"\" Run the command. :param params: The parameters of the command. :return: The result content. \"\"\" bash_command = params . get ( \"command\" ) result = subprocess . run ( bash_command , shell = True , capture_output = True , text = True ) return result . stdout Command We now only support one command in the Bash Automator to execute a bash command on the host machine. @ShellReceiver.register class RunShellCommand(ShellCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.run_shell(params=self.params) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"run_shell\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description RunShellCommand run_shell Get the content of a web page into a markdown format.","title":"Bash Automator"},{"location":"automator/bash_automator/#bash-automator","text":"UFO allows the HostAgent to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator is implemented in the ufo/automator/app_apis/shell module. Note Only HostAgent is currently supported by the Bash Automator.","title":"Bash Automator"},{"location":"automator/bash_automator/#receiver","text":"The Web Automator receiver is the ShellReceiver class defined in the ufo/automator/app_apis/shell/shell_client.py file. Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the shell client. Source code in automator/app_apis/shell/shell_client.py 19 20 21 22 def __init__ ( self ) -> None : \"\"\" Initialize the shell client. \"\"\"","title":"Receiver"},{"location":"automator/bash_automator/#automator.app_apis.shell.shell_client.ShellReceiver.run_shell","text":"Run the command. Parameters: params ( Dict [ str , Any ] ) \u2013 The parameters of the command. Returns: Any \u2013 The result content. Source code in automator/app_apis/shell/shell_client.py 24 25 26 27 28 29 30 31 32 33 34 def run_shell ( self , params : Dict [ str , Any ]) -> Any : \"\"\" Run the command. :param params: The parameters of the command. :return: The result content. \"\"\" bash_command = params . get ( \"command\" ) result = subprocess . run ( bash_command , shell = True , capture_output = True , text = True ) return result . stdout","title":"run_shell"},{"location":"automator/bash_automator/#command","text":"We now only support one command in the Bash Automator to execute a bash command on the host machine. @ShellReceiver.register class RunShellCommand(ShellCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.run_shell(params=self.params) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"run_shell\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description RunShellCommand run_shell Get the content of a web page into a markdown format.","title":"Command"},{"location":"automator/overview/","text":"Application Automator The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation and API . Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. UI Automator - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the UIA or Win32 APIs to interact with the application's UI controls. API - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications. Web - This action type is used to interact with web applications. UFO uses the crawl4ai library to extract information from web pages. Bash - This action type is used to interact with the command line interface (CLI) of an application. AI Tool - This action type is used to interact with the LLM-based AI tools. Action Design Patterns Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action. The basic classes for implementing actions in UFO are as follows: Role Class Description Receiver ufo.automator.basic.ReceiverBasic The base class for all receivers in UFO. Receivers are objects that perform actions on applications. Command ufo.automator.basic.CommandBasic The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. Invoker ufo.automator.puppeteer.AppPuppeteer The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions. Receiver The Receiver is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager class. You can find the reference for a basic Receiver class below: Bases: ABC The abstract receiver interface. command_registry : Dict [ str , Type [ CommandBasic ]] property Get the command registry. supported_command_names : List [ str ] property Get the command name list. register ( command_class ) classmethod Decorator to register the state class to the state manager. Parameters: command_class ( Type [ CommandBasic ] ) \u2013 The state class to be registered. Returns: Type [ CommandBasic ] \u2013 The state class. Source code in automator/basic.py 46 47 48 49 50 51 52 53 54 @classmethod def register ( cls , command_class : Type [ CommandBasic ]) -> Type [ CommandBasic ]: \"\"\" Decorator to register the state class to the state manager. :param command_class: The state class to be registered. :return: The state class. \"\"\" cls . _command_registry [ command_class . name ()] = command_class return command_class register_command ( command_name , command ) Add to the command registry. Parameters: command_name ( str ) \u2013 The command name. command ( CommandBasic ) \u2013 The command. Source code in automator/basic.py 24 25 26 27 28 29 30 31 def register_command ( self , command_name : str , command : CommandBasic ) -> None : \"\"\" Add to the command registry. :param command_name: The command name. :param command: The command. \"\"\" self . command_registry [ command_name ] = command self_command_mapping () Get the command-receiver mapping. Source code in automator/basic.py 40 41 42 43 44 def self_command_mapping ( self ) -> Dict [ str , CommandBasic ]: \"\"\" Get the command-receiver mapping. \"\"\" return { command_name : self for command_name in self . supported_command_names } Command The Command is a specific action that the Receiver can perform on the application. It encapsulates the function and parameters required to execute the action. The Command class is a base class for all commands in the Automator application. You can find the reference for a basic Command class below: Bases: ABC The abstract command interface. Initialize the command. Parameters: receiver ( ReceiverBasic ) \u2013 The receiver of the command. Source code in automator/basic.py 67 68 69 70 71 72 73 def __init__ ( self , receiver : ReceiverBasic , params : Dict = None ) -> None : \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self . receiver = receiver self . params = params if params is not None else {} execute () abstractmethod Execute the command. Source code in automator/basic.py 75 76 77 78 79 80 @abstractmethod def execute ( self ): \"\"\" Execute the command. \"\"\" pass redo () Redo the command. Source code in automator/basic.py 88 89 90 91 92 def redo ( self ): \"\"\" Redo the command. \"\"\" self . execute () undo () Undo the command. Source code in automator/basic.py 82 83 84 85 86 def undo ( self ): \"\"\" Undo the command. \"\"\" pass Note Each command must register with a specific Receiver to be executed using the register_command decorator. For example: @ReceiverExample.register class CommandExample(CommandBasic): ... Invoker (AppPuppeteer) The AppPuppeteer plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer equips the AppAgent with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer with the ReceiverManager class. You can find the implementation of the AppPuppeteer class in the ufo/automator/puppeteer.py file, and its reference is shown below. The class for the app puppeteer to automate the app in the Windows environment. Initialize the app puppeteer. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The app root name, e.g., WINWORD.EXE. Source code in automator/puppeteer.py 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , process_name : str , app_root_name : str ) -> None : \"\"\" Initialize the app puppeteer. :param process_name: The process name of the app. :param app_root_name: The app root name, e.g., WINWORD.EXE. \"\"\" self . _process_name = process_name self . _app_root_name = app_root_name self . command_queue : Deque [ CommandBasic ] = deque () self . receiver_manager = ReceiverManager () full_path : str property Get the full path of the process. Only works for COM receiver. Returns: str \u2013 The full path of the process. add_command ( command_name , params , * args , ** kwargs ) Add the command to the command queue. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Source code in automator/puppeteer.py 94 95 96 97 98 99 100 101 102 103 def add_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> None : \"\"\" Add the command to the command queue. :param command_name: The command name. :param params: The arguments. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) self . command_queue . append ( command ) close () Close the app. Only works for COM receiver. Source code in automator/puppeteer.py 145 146 147 148 149 150 151 def close ( self ) -> None : \"\"\" Close the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . close () create_command ( command_name , params , * args , ** kwargs ) Create the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments for the command. Source code in automator/puppeteer.py 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def create_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> Optional [ CommandBasic ]: \"\"\" Create the command. :param command_name: The command name. :param params: The arguments for the command. \"\"\" receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) command = receiver . command_registry . get ( command_name . lower (), None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) if command is None : raise ValueError ( f \"Command { command_name } is not supported.\" ) return command ( receiver , params , * args , ** kwargs ) execute_all_commands () Execute all the commands in the command queue. Returns: List [ Any ] \u2013 The execution results. Source code in automator/puppeteer.py 82 83 84 85 86 87 88 89 90 91 92 def execute_all_commands ( self ) -> List [ Any ]: \"\"\" Execute all the commands in the command queue. :return: The execution results. \"\"\" results = [] while self . command_queue : command = self . command_queue . popleft () results . append ( command . execute ()) return results execute_command ( command_name , params , * args , ** kwargs ) Execute the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Returns: str \u2013 The execution result. Source code in automator/puppeteer.py 68 69 70 71 72 73 74 75 76 77 78 79 80 def execute_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> str : \"\"\" Execute the command. :param command_name: The command name. :param params: The arguments. :return: The execution result. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) return command . execute () get_command_queue_length () Get the length of the command queue. Returns: int \u2013 The length of the command queue. Source code in automator/puppeteer.py 105 106 107 108 109 110 def get_command_queue_length ( self ) -> int : \"\"\" Get the length of the command queue. :return: The length of the command queue. \"\"\" return len ( self . command_queue ) get_command_string ( command_name , params ) staticmethod Generate a function call string. Parameters: command_name ( str ) \u2013 The function name. params ( Dict [ str , str ] ) \u2013 The arguments as a dictionary. Returns: str \u2013 The function call string. Source code in automator/puppeteer.py 153 154 155 156 157 158 159 160 161 162 163 164 165 @staticmethod def get_command_string ( command_name : str , params : Dict [ str , str ]) -> str : \"\"\" Generate a function call string. :param command_name: The function name. :param params: The arguments as a dictionary. :return: The function call string. \"\"\" # Format the arguments args_str = \", \" . join ( f \" { k } = { v !r} \" for k , v in params . items ()) # Return the function call string return f \" { command_name } ( { args_str } )\" get_command_types ( command_name ) Get the command types. Parameters: command_name ( str ) \u2013 The command name. Returns: str \u2013 The command types. Source code in automator/puppeteer.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_command_types ( self , command_name : str ) -> str : \"\"\" Get the command types. :param command_name: The command name. :return: The command types. \"\"\" try : receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) return receiver . type_name except : return \"\" save () Save the current state of the app. Only works for COM receiver. Source code in automator/puppeteer.py 124 125 126 127 128 129 130 def save ( self ) -> None : \"\"\" Save the current state of the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . save () save_to_xml ( file_path ) Save the current state of the app to XML. Only works for COM receiver. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/puppeteer.py 132 133 134 135 136 137 138 139 140 141 142 143 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. Only works for COM receiver. :param file_path: The file path to save the XML. \"\"\" com_receiver = self . receiver_manager . com_receiver dir_path = os . path . dirname ( file_path ) if not os . path . exists ( dir_path ): os . makedirs ( dir_path ) if com_receiver is not None : com_receiver . save_to_xml ( file_path ) Receiver Manager The ReceiverManager manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer . The class for the receiver manager. Initialize the receiver manager. Source code in automator/puppeteer.py 175 176 177 178 179 180 181 182 183 def __init__ ( self ): \"\"\" Initialize the receiver manager. \"\"\" self . receiver_registry = {} self . ui_control_receiver : Optional [ ControlReceiver ] = None self . _receiver_list : List [ ReceiverBasic ] = [] com_receiver : WinCOMReceiverBasic property Get the COM receiver. Returns: WinCOMReceiverBasic \u2013 The COM receiver. receiver_factory_registry : Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] property Get the receiver factory registry. Returns: Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] \u2013 The receiver factory registry. receiver_list : List [ ReceiverBasic ] property Get the receiver list. Returns: List [ ReceiverBasic ] \u2013 The receiver list. create_api_receiver ( app_root_name , process_name ) Get the API receiver. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. Source code in automator/puppeteer.py 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 def create_api_receiver ( self , app_root_name : str , process_name : str ) -> None : \"\"\" Get the API receiver. :param app_root_name: The app root name. :param process_name: The process name. \"\"\" for receiver_factory_dict in self . receiver_factory_registry . values (): # Check if the receiver is API if receiver_factory_dict . get ( \"is_api\" ): receiver = receiver_factory_dict . get ( \"factory\" ) . create_receiver ( app_root_name , process_name ) if receiver is not None : self . receiver_list . append ( receiver ) self . _update_receiver_registry () create_ui_control_receiver ( control , application ) Build the UI controller. Parameters: control ( UIAWrapper ) \u2013 The control element. application ( UIAWrapper ) \u2013 The application window. Returns: ControlReceiver \u2013 The UI controller receiver. Source code in automator/puppeteer.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 def create_ui_control_receiver ( self , control : UIAWrapper , application : UIAWrapper ) -> \"ControlReceiver\" : \"\"\" Build the UI controller. :param control: The control element. :param application: The application window. :return: The UI controller receiver. \"\"\" # control can be None if not application : return None factory : ReceiverFactory = self . receiver_factory_registry . get ( \"UIControl\" ) . get ( \"factory\" ) self . ui_control_receiver = factory . create_receiver ( control , application ) self . receiver_list . append ( self . ui_control_receiver ) self . _update_receiver_registry () return self . ui_control_receiver get_receiver_from_command_name ( command_name ) Get the receiver from the command name. Parameters: command_name ( str ) \u2013 The command name. Returns: ReceiverBasic \u2013 The mapped receiver. Source code in automator/puppeteer.py 235 236 237 238 239 240 241 242 243 244 def get_receiver_from_command_name ( self , command_name : str ) -> ReceiverBasic : \"\"\" Get the receiver from the command name. :param command_name: The command name. :return: The mapped receiver. \"\"\" receiver = self . receiver_registry . get ( command_name , None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) return receiver register ( receiver_factory_class ) classmethod Decorator to register the receiver factory class to the receiver manager. Parameters: receiver_factory_class ( Type [ ReceiverFactory ] ) \u2013 The receiver factory class to be registered. Returns: ReceiverFactory \u2013 The receiver factory class instance. Source code in automator/puppeteer.py 276 277 278 279 280 281 282 283 284 285 286 287 288 289 @classmethod def register ( cls , receiver_factory_class : Type [ ReceiverFactory ]) -> ReceiverFactory : \"\"\" Decorator to register the receiver factory class to the receiver manager. :param receiver_factory_class: The receiver factory class to be registered. :return: The receiver factory class instance. \"\"\" cls . _receiver_factory_registry [ receiver_factory_class . name ()] = { \"factory\" : receiver_factory_class (), \"is_api\" : receiver_factory_class . is_api (), } return receiver_factory_class () For further details, refer to the specific documentation for each component and class in the Automator module.","title":"Overview"},{"location":"automator/overview/#application-automator","text":"The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation and API . Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. UI Automator - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the UIA or Win32 APIs to interact with the application's UI controls. API - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications. Web - This action type is used to interact with web applications. UFO uses the crawl4ai library to extract information from web pages. Bash - This action type is used to interact with the command line interface (CLI) of an application. AI Tool - This action type is used to interact with the LLM-based AI tools.","title":"Application Automator"},{"location":"automator/overview/#action-design-patterns","text":"Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action. The basic classes for implementing actions in UFO are as follows: Role Class Description Receiver ufo.automator.basic.ReceiverBasic The base class for all receivers in UFO. Receivers are objects that perform actions on applications. Command ufo.automator.basic.CommandBasic The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. Invoker ufo.automator.puppeteer.AppPuppeteer The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions.","title":"Action Design Patterns"},{"location":"automator/overview/#receiver","text":"The Receiver is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager class. You can find the reference for a basic Receiver class below: Bases: ABC The abstract receiver interface.","title":"Receiver"},{"location":"automator/overview/#automator.basic.ReceiverBasic.command_registry","text":"Get the command registry.","title":"command_registry"},{"location":"automator/overview/#automator.basic.ReceiverBasic.supported_command_names","text":"Get the command name list.","title":"supported_command_names"},{"location":"automator/overview/#automator.basic.ReceiverBasic.register","text":"Decorator to register the state class to the state manager. Parameters: command_class ( Type [ CommandBasic ] ) \u2013 The state class to be registered. Returns: Type [ CommandBasic ] \u2013 The state class. Source code in automator/basic.py 46 47 48 49 50 51 52 53 54 @classmethod def register ( cls , command_class : Type [ CommandBasic ]) -> Type [ CommandBasic ]: \"\"\" Decorator to register the state class to the state manager. :param command_class: The state class to be registered. :return: The state class. \"\"\" cls . _command_registry [ command_class . name ()] = command_class return command_class","title":"register"},{"location":"automator/overview/#automator.basic.ReceiverBasic.register_command","text":"Add to the command registry. Parameters: command_name ( str ) \u2013 The command name. command ( CommandBasic ) \u2013 The command. Source code in automator/basic.py 24 25 26 27 28 29 30 31 def register_command ( self , command_name : str , command : CommandBasic ) -> None : \"\"\" Add to the command registry. :param command_name: The command name. :param command: The command. \"\"\" self . command_registry [ command_name ] = command","title":"register_command"},{"location":"automator/overview/#automator.basic.ReceiverBasic.self_command_mapping","text":"Get the command-receiver mapping. Source code in automator/basic.py 40 41 42 43 44 def self_command_mapping ( self ) -> Dict [ str , CommandBasic ]: \"\"\" Get the command-receiver mapping. \"\"\" return { command_name : self for command_name in self . supported_command_names }","title":"self_command_mapping"},{"location":"automator/overview/#command","text":"The Command is a specific action that the Receiver can perform on the application. It encapsulates the function and parameters required to execute the action. The Command class is a base class for all commands in the Automator application. You can find the reference for a basic Command class below: Bases: ABC The abstract command interface. Initialize the command. Parameters: receiver ( ReceiverBasic ) \u2013 The receiver of the command. Source code in automator/basic.py 67 68 69 70 71 72 73 def __init__ ( self , receiver : ReceiverBasic , params : Dict = None ) -> None : \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self . receiver = receiver self . params = params if params is not None else {}","title":"Command"},{"location":"automator/overview/#automator.basic.CommandBasic.execute","text":"Execute the command. Source code in automator/basic.py 75 76 77 78 79 80 @abstractmethod def execute ( self ): \"\"\" Execute the command. \"\"\" pass","title":"execute"},{"location":"automator/overview/#automator.basic.CommandBasic.redo","text":"Redo the command. Source code in automator/basic.py 88 89 90 91 92 def redo ( self ): \"\"\" Redo the command. \"\"\" self . execute ()","title":"redo"},{"location":"automator/overview/#automator.basic.CommandBasic.undo","text":"Undo the command. Source code in automator/basic.py 82 83 84 85 86 def undo ( self ): \"\"\" Undo the command. \"\"\" pass Note Each command must register with a specific Receiver to be executed using the register_command decorator. For example: @ReceiverExample.register class CommandExample(CommandBasic): ...","title":"undo"},{"location":"automator/overview/#invoker-apppuppeteer","text":"The AppPuppeteer plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer equips the AppAgent with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer with the ReceiverManager class. You can find the implementation of the AppPuppeteer class in the ufo/automator/puppeteer.py file, and its reference is shown below. The class for the app puppeteer to automate the app in the Windows environment. Initialize the app puppeteer. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The app root name, e.g., WINWORD.EXE. Source code in automator/puppeteer.py 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , process_name : str , app_root_name : str ) -> None : \"\"\" Initialize the app puppeteer. :param process_name: The process name of the app. :param app_root_name: The app root name, e.g., WINWORD.EXE. \"\"\" self . _process_name = process_name self . _app_root_name = app_root_name self . command_queue : Deque [ CommandBasic ] = deque () self . receiver_manager = ReceiverManager ()","title":"Invoker (AppPuppeteer)"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.full_path","text":"Get the full path of the process. Only works for COM receiver. Returns: str \u2013 The full path of the process.","title":"full_path"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.add_command","text":"Add the command to the command queue. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Source code in automator/puppeteer.py 94 95 96 97 98 99 100 101 102 103 def add_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> None : \"\"\" Add the command to the command queue. :param command_name: The command name. :param params: The arguments. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) self . command_queue . append ( command )","title":"add_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.close","text":"Close the app. Only works for COM receiver. Source code in automator/puppeteer.py 145 146 147 148 149 150 151 def close ( self ) -> None : \"\"\" Close the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . close ()","title":"close"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.create_command","text":"Create the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments for the command. Source code in automator/puppeteer.py 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def create_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> Optional [ CommandBasic ]: \"\"\" Create the command. :param command_name: The command name. :param params: The arguments for the command. \"\"\" receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) command = receiver . command_registry . get ( command_name . lower (), None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) if command is None : raise ValueError ( f \"Command { command_name } is not supported.\" ) return command ( receiver , params , * args , ** kwargs )","title":"create_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.execute_all_commands","text":"Execute all the commands in the command queue. Returns: List [ Any ] \u2013 The execution results. Source code in automator/puppeteer.py 82 83 84 85 86 87 88 89 90 91 92 def execute_all_commands ( self ) -> List [ Any ]: \"\"\" Execute all the commands in the command queue. :return: The execution results. \"\"\" results = [] while self . command_queue : command = self . command_queue . popleft () results . append ( command . execute ()) return results","title":"execute_all_commands"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.execute_command","text":"Execute the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Returns: str \u2013 The execution result. Source code in automator/puppeteer.py 68 69 70 71 72 73 74 75 76 77 78 79 80 def execute_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> str : \"\"\" Execute the command. :param command_name: The command name. :param params: The arguments. :return: The execution result. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) return command . execute ()","title":"execute_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_queue_length","text":"Get the length of the command queue. Returns: int \u2013 The length of the command queue. Source code in automator/puppeteer.py 105 106 107 108 109 110 def get_command_queue_length ( self ) -> int : \"\"\" Get the length of the command queue. :return: The length of the command queue. \"\"\" return len ( self . command_queue )","title":"get_command_queue_length"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_string","text":"Generate a function call string. Parameters: command_name ( str ) \u2013 The function name. params ( Dict [ str , str ] ) \u2013 The arguments as a dictionary. Returns: str \u2013 The function call string. Source code in automator/puppeteer.py 153 154 155 156 157 158 159 160 161 162 163 164 165 @staticmethod def get_command_string ( command_name : str , params : Dict [ str , str ]) -> str : \"\"\" Generate a function call string. :param command_name: The function name. :param params: The arguments as a dictionary. :return: The function call string. \"\"\" # Format the arguments args_str = \", \" . join ( f \" { k } = { v !r} \" for k , v in params . items ()) # Return the function call string return f \" { command_name } ( { args_str } )\"","title":"get_command_string"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_types","text":"Get the command types. Parameters: command_name ( str ) \u2013 The command name. Returns: str \u2013 The command types. Source code in automator/puppeteer.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_command_types ( self , command_name : str ) -> str : \"\"\" Get the command types. :param command_name: The command name. :return: The command types. \"\"\" try : receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) return receiver . type_name except : return \"\"","title":"get_command_types"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.save","text":"Save the current state of the app. Only works for COM receiver. Source code in automator/puppeteer.py 124 125 126 127 128 129 130 def save ( self ) -> None : \"\"\" Save the current state of the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . save ()","title":"save"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.save_to_xml","text":"Save the current state of the app to XML. Only works for COM receiver. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/puppeteer.py 132 133 134 135 136 137 138 139 140 141 142 143 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. Only works for COM receiver. :param file_path: The file path to save the XML. \"\"\" com_receiver = self . receiver_manager . com_receiver dir_path = os . path . dirname ( file_path ) if not os . path . exists ( dir_path ): os . makedirs ( dir_path ) if com_receiver is not None : com_receiver . save_to_xml ( file_path )","title":"save_to_xml"},{"location":"automator/overview/#receiver-manager","text":"The ReceiverManager manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer . The class for the receiver manager. Initialize the receiver manager. Source code in automator/puppeteer.py 175 176 177 178 179 180 181 182 183 def __init__ ( self ): \"\"\" Initialize the receiver manager. \"\"\" self . receiver_registry = {} self . ui_control_receiver : Optional [ ControlReceiver ] = None self . _receiver_list : List [ ReceiverBasic ] = []","title":"Receiver Manager"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.com_receiver","text":"Get the COM receiver. Returns: WinCOMReceiverBasic \u2013 The COM receiver.","title":"com_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.receiver_factory_registry","text":"Get the receiver factory registry. Returns: Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] \u2013 The receiver factory registry.","title":"receiver_factory_registry"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.receiver_list","text":"Get the receiver list. Returns: List [ ReceiverBasic ] \u2013 The receiver list.","title":"receiver_list"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.create_api_receiver","text":"Get the API receiver. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. Source code in automator/puppeteer.py 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 def create_api_receiver ( self , app_root_name : str , process_name : str ) -> None : \"\"\" Get the API receiver. :param app_root_name: The app root name. :param process_name: The process name. \"\"\" for receiver_factory_dict in self . receiver_factory_registry . values (): # Check if the receiver is API if receiver_factory_dict . get ( \"is_api\" ): receiver = receiver_factory_dict . get ( \"factory\" ) . create_receiver ( app_root_name , process_name ) if receiver is not None : self . receiver_list . append ( receiver ) self . _update_receiver_registry ()","title":"create_api_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.create_ui_control_receiver","text":"Build the UI controller. Parameters: control ( UIAWrapper ) \u2013 The control element. application ( UIAWrapper ) \u2013 The application window. Returns: ControlReceiver \u2013 The UI controller receiver. Source code in automator/puppeteer.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 def create_ui_control_receiver ( self , control : UIAWrapper , application : UIAWrapper ) -> \"ControlReceiver\" : \"\"\" Build the UI controller. :param control: The control element. :param application: The application window. :return: The UI controller receiver. \"\"\" # control can be None if not application : return None factory : ReceiverFactory = self . receiver_factory_registry . get ( \"UIControl\" ) . get ( \"factory\" ) self . ui_control_receiver = factory . create_receiver ( control , application ) self . receiver_list . append ( self . ui_control_receiver ) self . _update_receiver_registry () return self . ui_control_receiver","title":"create_ui_control_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.get_receiver_from_command_name","text":"Get the receiver from the command name. Parameters: command_name ( str ) \u2013 The command name. Returns: ReceiverBasic \u2013 The mapped receiver. Source code in automator/puppeteer.py 235 236 237 238 239 240 241 242 243 244 def get_receiver_from_command_name ( self , command_name : str ) -> ReceiverBasic : \"\"\" Get the receiver from the command name. :param command_name: The command name. :return: The mapped receiver. \"\"\" receiver = self . receiver_registry . get ( command_name , None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) return receiver","title":"get_receiver_from_command_name"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.register","text":"Decorator to register the receiver factory class to the receiver manager. Parameters: receiver_factory_class ( Type [ ReceiverFactory ] ) \u2013 The receiver factory class to be registered. Returns: ReceiverFactory \u2013 The receiver factory class instance. Source code in automator/puppeteer.py 276 277 278 279 280 281 282 283 284 285 286 287 288 289 @classmethod def register ( cls , receiver_factory_class : Type [ ReceiverFactory ]) -> ReceiverFactory : \"\"\" Decorator to register the receiver factory class to the receiver manager. :param receiver_factory_class: The receiver factory class to be registered. :return: The receiver factory class instance. \"\"\" cls . _receiver_factory_registry [ receiver_factory_class . name ()] = { \"factory\" : receiver_factory_class (), \"is_api\" : receiver_factory_class . is_api (), } return receiver_factory_class () For further details, refer to the specific documentation for each component and class in the Automator module.","title":"register"},{"location":"automator/ui_automator/","text":"UI Automator The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus. Configuration There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False Receiver The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class: Bases: ReceiverBasic The control receiver class. Initialize the control receiver. Parameters: control ( Optional [ UIAWrapper ] ) \u2013 The control element. application ( Optional [ UIAWrapper ] ) \u2013 The application element. Source code in automator/ui_control/controller.py 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 def __init__ ( self , control : Optional [ UIAWrapper ], application : Optional [ UIAWrapper ] ) -> None : \"\"\" Initialize the control receiver. :param control: The control element. :param application: The application element. \"\"\" self . control = control self . application = application if control : self . control . set_focus () self . wait_enabled () elif application : self . application . set_focus () annotation ( params , annotation_dict ) Take a screenshot of the current application window and annotate the control item on the screenshot. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the annotation method. annotation_dict ( Dict [ str , UIAWrapper ] ) \u2013 The dictionary of the control labels. Source code in automator/ui_control/controller.py 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 def annotation ( self , params : Dict [ str , str ], annotation_dict : Dict [ str , UIAWrapper ] ) -> List [ str ]: \"\"\" Take a screenshot of the current application window and annotate the control item on the screenshot. :param params: The arguments of the annotation method. :param annotation_dict: The dictionary of the control labels. \"\"\" selected_controls_labels = params . get ( \"control_labels\" , []) control_reannotate = [ annotation_dict [ str ( label )] for label in selected_controls_labels ] return control_reannotate atomic_execution ( method_name , params ) Atomic execution of the action on the control elements. Parameters: method_name ( str ) \u2013 The name of the method to execute. params ( Dict [ str , Any ] ) \u2013 The arguments of the method. Returns: str \u2013 The result of the action. Source code in automator/ui_control/controller.py 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def atomic_execution ( self , method_name : str , params : Dict [ str , Any ]) -> str : \"\"\" Atomic execution of the action on the control elements. :param method_name: The name of the method to execute. :param params: The arguments of the method. :return: The result of the action. \"\"\" import traceback try : method = getattr ( self . control , method_name ) result = method ( ** params ) except AttributeError : message = f \" { self . control } doesn't have a method named { method_name } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message except Exception as e : full_traceback = traceback . format_exc () message = f \"An error occurred: { full_traceback } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message return result click_input ( params ) Click the control element. Parameters: params ( Dict [ str , Union [ str , bool ]] ) \u2013 The arguments of the click method. Returns: str \u2013 The result of the click action. Source code in automator/ui_control/controller.py 79 80 81 82 83 84 85 86 87 88 89 90 91 def click_input ( self , params : Dict [ str , Union [ str , bool ]]) -> str : \"\"\" Click the control element. :param params: The arguments of the click method. :return: The result of the click action. \"\"\" api_name = configs . get ( \"CLICK_API\" , \"click_input\" ) if api_name == \"click\" : return self . atomic_execution ( \"click\" , params ) else : return self . atomic_execution ( \"click_input\" , params ) click_on_coordinates ( params ) Click on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the click on coordinates method. Returns: str \u2013 The result of the click on coordinates action. Source code in automator/ui_control/controller.py 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 def click_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Click on the coordinates of the control element. :param params: The arguments of the click on coordinates method. :return: The result of the click on coordinates action. \"\"\" # Get the relative coordinates fraction of the application window. x = float ( params . get ( \"x\" , 0 )) y = float ( params . get ( \"y\" , 0 )) button = params . get ( \"button\" , \"left\" ) double = params . get ( \"double\" , False ) # Get the absolute coordinates of the application window. tranformed_x , tranformed_y = self . transform_point ( x , y ) self . application . set_focus () pyautogui . click ( tranformed_x , tranformed_y , button = button , clicks = 2 if double else 1 ) return \"\" drag_on_coordinates ( params ) Drag on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the drag on coordinates method. Returns: str \u2013 The result of the drag on coordinates action. Source code in automator/ui_control/controller.py 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 def drag_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Drag on the coordinates of the control element. :param params: The arguments of the drag on coordinates method. :return: The result of the drag on coordinates action. \"\"\" start = self . transform_point ( float ( params . get ( \"start_x\" , 0 )), float ( params . get ( \"start_y\" , 0 )) ) end = self . transform_point ( float ( params . get ( \"end_x\" , 0 )), float ( params . get ( \"end_y\" , 0 )) ) button = params . get ( \"button\" , \"left\" ) self . application . set_focus () pyautogui . moveTo ( start [ 0 ], start [ 1 ]) pyautogui . dragTo ( end [ 0 ], end [ 1 ], button = button ) return \"\" keyboard_input ( params ) Keyboard input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the keyboard input method. Returns: str \u2013 The result of the keyboard input action. Source code in automator/ui_control/controller.py 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 def keyboard_input ( self , params : Dict [ str , str ]) -> str : \"\"\" Keyboard input on the control element. :param params: The arguments of the keyboard input method. :return: The result of the keyboard input action. \"\"\" control_focus = params . get ( \"control_focus\" , True ) keys = params . get ( \"keys\" , \"\" ) if control_focus : self . atomic_execution ( \"type_keys\" , { \"keys\" : keys }) else : pyautogui . typewrite ( keys ) return keys no_action () No action on the control element. Returns: \u2013 The result of the no action. Source code in automator/ui_control/controller.py 232 233 234 235 236 237 238 def no_action ( self ): \"\"\" No action on the control element. :return: The result of the no action. \"\"\" return \"\" set_edit_text ( params ) Set the edit text of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the set edit text method. Returns: str \u2013 The result of the set edit text action. Source code in automator/ui_control/controller.py 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 def set_edit_text ( self , params : Dict [ str , str ]) -> str : \"\"\" Set the edit text of the control element. :param params: The arguments of the set edit text method. :return: The result of the set edit text action. \"\"\" text = params . get ( \"text\" , \"\" ) inter_key_pause = configs . get ( \"INPUT_TEXT_INTER_KEY_PAUSE\" , 0.1 ) if configs [ \"INPUT_TEXT_API\" ] == \"set_text\" : method_name = \"set_edit_text\" args = { \"text\" : text } else : method_name = \"type_keys\" # Transform the text according to the tags. text = TextTransformer . transform_text ( text , \"all\" ) args = { \"keys\" : text , \"pause\" : inter_key_pause , \"with_spaces\" : True } try : result = self . atomic_execution ( method_name , args ) if ( method_name == \"set_text\" and args [ \"text\" ] not in self . control . window_text () ): raise Exception ( f \"Failed to use set_text: { args [ 'text' ] } \" ) if configs [ \"INPUT_TEXT_ENTER\" ] and method_name in [ \"type_keys\" , \"set_text\" ]: self . atomic_execution ( \"type_keys\" , params = { \"keys\" : \" {ENTER} \" }) return result except Exception as e : if method_name == \"set_text\" : print_with_color ( f \" { self . control } doesn't have a method named { method_name } , trying default input method\" , \"yellow\" , ) method_name = \"type_keys\" clear_text_keys = \"^a {BACKSPACE} \" text_to_type = args [ \"text\" ] keys_to_send = clear_text_keys + text_to_type method_name = \"type_keys\" args = { \"keys\" : keys_to_send , \"pause\" : inter_key_pause , \"with_spaces\" : True , } return self . atomic_execution ( method_name , args ) else : return f \"An error occurred: { e } \" summary ( params ) Visual summary of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the visual summary method. should contain a key \"text\" with the text summary. Returns: str \u2013 The result of the visual summary action. Source code in automator/ui_control/controller.py 141 142 143 144 145 146 147 148 def summary ( self , params : Dict [ str , str ]) -> str : \"\"\" Visual summary of the control element. :param params: The arguments of the visual summary method. should contain a key \"text\" with the text summary. :return: The result of the visual summary action. \"\"\" return params . get ( \"text\" ) texts () Get the text of the control element. Returns: str \u2013 The text of the control element. Source code in automator/ui_control/controller.py 217 218 219 220 221 222 def texts ( self ) -> str : \"\"\" Get the text of the control element. :return: The text of the control element. \"\"\" return self . control . texts () transform_point ( fraction_x , fraction_y ) Transform the relative coordinates to the absolute coordinates. Parameters: fraction_x ( float ) \u2013 The relative x coordinate. fraction_y ( float ) \u2013 The relative y coordinate. Returns: Tuple [ int , int ] \u2013 The absolute coordinates. Source code in automator/ui_control/controller.py 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 def transform_point ( self , fraction_x : float , fraction_y : float ) -> Tuple [ int , int ]: \"\"\" Transform the relative coordinates to the absolute coordinates. :param fraction_x: The relative x coordinate. :param fraction_y: The relative y coordinate. :return: The absolute coordinates. \"\"\" application_rect : RECT = self . application . rectangle () application_x = application_rect . left application_y = application_rect . top application_width = application_rect . width () application_height = application_rect . height () x = application_x + int ( application_width * fraction_x ) y = application_y + int ( application_height * fraction_y ) return x , y wait_enabled ( timeout = 10 , retry_interval = 0.5 ) Wait until the control is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 256 257 258 259 260 261 262 263 264 265 266 267 def wait_enabled ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the control is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_enabled (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not enabled.\" ) break wait_visible ( timeout = 10 , retry_interval = 0.5 ) Wait until the window is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 269 270 271 272 273 274 275 276 277 278 279 280 def wait_visible ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the window is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_visible (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not visible.\" ) break wheel_mouse_input ( params ) Wheel mouse input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the wheel mouse input method. Returns: \u2013 The result of the wheel mouse input action. Source code in automator/ui_control/controller.py 224 225 226 227 228 229 230 def wheel_mouse_input ( self , params : Dict [ str , str ]): \"\"\" Wheel mouse input on the control element. :param params: The arguments of the wheel mouse input method. :return: The result of the wheel mouse input action. \"\"\" return self . atomic_execution ( \"wheel_mouse_input\" , params ) Command The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class: @ControlReceiver.register class ClickInputCommand(ControlCommand): \"\"\" The click input command class. \"\"\" def execute(self) -> str: \"\"\" Execute the click input command. :return: The result of the click input command. \"\"\" return self.receiver.click_input(self.params) @classmethod def name(cls) -> str: \"\"\" Get the name of the atomic command. :return: The name of the atomic command. \"\"\" return \"click_input\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator. Below is the list of available commands in the UI Automator that are currently supported by UFO: Command Name Function Name Description ClickInputCommand click_input Click the control item with the mouse. ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window. DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window. SetEditTextCommand set_edit_text Add new text to the control item. GetTextsCommand texts Get the text of the control item. WheelMouseInputCommand wheel_mouse_input Scroll the control item. KeyboardInputCommand keyboard_input Simulate the keyboard input. Tip Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator. Tip You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.","title":"UI Automator"},{"location":"automator/ui_automator/#ui-automator","text":"The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.","title":"UI Automator"},{"location":"automator/ui_automator/#configuration","text":"There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False","title":"Configuration"},{"location":"automator/ui_automator/#receiver","text":"The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class: Bases: ReceiverBasic The control receiver class. Initialize the control receiver. Parameters: control ( Optional [ UIAWrapper ] ) \u2013 The control element. application ( Optional [ UIAWrapper ] ) \u2013 The application element. Source code in automator/ui_control/controller.py 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 def __init__ ( self , control : Optional [ UIAWrapper ], application : Optional [ UIAWrapper ] ) -> None : \"\"\" Initialize the control receiver. :param control: The control element. :param application: The application element. \"\"\" self . control = control self . application = application if control : self . control . set_focus () self . wait_enabled () elif application : self . application . set_focus ()","title":"Receiver"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.annotation","text":"Take a screenshot of the current application window and annotate the control item on the screenshot. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the annotation method. annotation_dict ( Dict [ str , UIAWrapper ] ) \u2013 The dictionary of the control labels. Source code in automator/ui_control/controller.py 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 def annotation ( self , params : Dict [ str , str ], annotation_dict : Dict [ str , UIAWrapper ] ) -> List [ str ]: \"\"\" Take a screenshot of the current application window and annotate the control item on the screenshot. :param params: The arguments of the annotation method. :param annotation_dict: The dictionary of the control labels. \"\"\" selected_controls_labels = params . get ( \"control_labels\" , []) control_reannotate = [ annotation_dict [ str ( label )] for label in selected_controls_labels ] return control_reannotate","title":"annotation"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.atomic_execution","text":"Atomic execution of the action on the control elements. Parameters: method_name ( str ) \u2013 The name of the method to execute. params ( Dict [ str , Any ] ) \u2013 The arguments of the method. Returns: str \u2013 The result of the action. Source code in automator/ui_control/controller.py 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def atomic_execution ( self , method_name : str , params : Dict [ str , Any ]) -> str : \"\"\" Atomic execution of the action on the control elements. :param method_name: The name of the method to execute. :param params: The arguments of the method. :return: The result of the action. \"\"\" import traceback try : method = getattr ( self . control , method_name ) result = method ( ** params ) except AttributeError : message = f \" { self . control } doesn't have a method named { method_name } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message except Exception as e : full_traceback = traceback . format_exc () message = f \"An error occurred: { full_traceback } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message return result","title":"atomic_execution"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.click_input","text":"Click the control element. Parameters: params ( Dict [ str , Union [ str , bool ]] ) \u2013 The arguments of the click method. Returns: str \u2013 The result of the click action. Source code in automator/ui_control/controller.py 79 80 81 82 83 84 85 86 87 88 89 90 91 def click_input ( self , params : Dict [ str , Union [ str , bool ]]) -> str : \"\"\" Click the control element. :param params: The arguments of the click method. :return: The result of the click action. \"\"\" api_name = configs . get ( \"CLICK_API\" , \"click_input\" ) if api_name == \"click\" : return self . atomic_execution ( \"click\" , params ) else : return self . atomic_execution ( \"click_input\" , params )","title":"click_input"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.click_on_coordinates","text":"Click on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the click on coordinates method. Returns: str \u2013 The result of the click on coordinates action. Source code in automator/ui_control/controller.py 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 def click_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Click on the coordinates of the control element. :param params: The arguments of the click on coordinates method. :return: The result of the click on coordinates action. \"\"\" # Get the relative coordinates fraction of the application window. x = float ( params . get ( \"x\" , 0 )) y = float ( params . get ( \"y\" , 0 )) button = params . get ( \"button\" , \"left\" ) double = params . get ( \"double\" , False ) # Get the absolute coordinates of the application window. tranformed_x , tranformed_y = self . transform_point ( x , y ) self . application . set_focus () pyautogui . click ( tranformed_x , tranformed_y , button = button , clicks = 2 if double else 1 ) return \"\"","title":"click_on_coordinates"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.drag_on_coordinates","text":"Drag on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the drag on coordinates method. Returns: str \u2013 The result of the drag on coordinates action. Source code in automator/ui_control/controller.py 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 def drag_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Drag on the coordinates of the control element. :param params: The arguments of the drag on coordinates method. :return: The result of the drag on coordinates action. \"\"\" start = self . transform_point ( float ( params . get ( \"start_x\" , 0 )), float ( params . get ( \"start_y\" , 0 )) ) end = self . transform_point ( float ( params . get ( \"end_x\" , 0 )), float ( params . get ( \"end_y\" , 0 )) ) button = params . get ( \"button\" , \"left\" ) self . application . set_focus () pyautogui . moveTo ( start [ 0 ], start [ 1 ]) pyautogui . dragTo ( end [ 0 ], end [ 1 ], button = button ) return \"\"","title":"drag_on_coordinates"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.keyboard_input","text":"Keyboard input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the keyboard input method. Returns: str \u2013 The result of the keyboard input action. Source code in automator/ui_control/controller.py 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 def keyboard_input ( self , params : Dict [ str , str ]) -> str : \"\"\" Keyboard input on the control element. :param params: The arguments of the keyboard input method. :return: The result of the keyboard input action. \"\"\" control_focus = params . get ( \"control_focus\" , True ) keys = params . get ( \"keys\" , \"\" ) if control_focus : self . atomic_execution ( \"type_keys\" , { \"keys\" : keys }) else : pyautogui . typewrite ( keys ) return keys","title":"keyboard_input"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.no_action","text":"No action on the control element. Returns: \u2013 The result of the no action. Source code in automator/ui_control/controller.py 232 233 234 235 236 237 238 def no_action ( self ): \"\"\" No action on the control element. :return: The result of the no action. \"\"\" return \"\"","title":"no_action"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.set_edit_text","text":"Set the edit text of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the set edit text method. Returns: str \u2013 The result of the set edit text action. Source code in automator/ui_control/controller.py 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 def set_edit_text ( self , params : Dict [ str , str ]) -> str : \"\"\" Set the edit text of the control element. :param params: The arguments of the set edit text method. :return: The result of the set edit text action. \"\"\" text = params . get ( \"text\" , \"\" ) inter_key_pause = configs . get ( \"INPUT_TEXT_INTER_KEY_PAUSE\" , 0.1 ) if configs [ \"INPUT_TEXT_API\" ] == \"set_text\" : method_name = \"set_edit_text\" args = { \"text\" : text } else : method_name = \"type_keys\" # Transform the text according to the tags. text = TextTransformer . transform_text ( text , \"all\" ) args = { \"keys\" : text , \"pause\" : inter_key_pause , \"with_spaces\" : True } try : result = self . atomic_execution ( method_name , args ) if ( method_name == \"set_text\" and args [ \"text\" ] not in self . control . window_text () ): raise Exception ( f \"Failed to use set_text: { args [ 'text' ] } \" ) if configs [ \"INPUT_TEXT_ENTER\" ] and method_name in [ \"type_keys\" , \"set_text\" ]: self . atomic_execution ( \"type_keys\" , params = { \"keys\" : \" {ENTER} \" }) return result except Exception as e : if method_name == \"set_text\" : print_with_color ( f \" { self . control } doesn't have a method named { method_name } , trying default input method\" , \"yellow\" , ) method_name = \"type_keys\" clear_text_keys = \"^a {BACKSPACE} \" text_to_type = args [ \"text\" ] keys_to_send = clear_text_keys + text_to_type method_name = \"type_keys\" args = { \"keys\" : keys_to_send , \"pause\" : inter_key_pause , \"with_spaces\" : True , } return self . atomic_execution ( method_name , args ) else : return f \"An error occurred: { e } \"","title":"set_edit_text"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.summary","text":"Visual summary of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the visual summary method. should contain a key \"text\" with the text summary. Returns: str \u2013 The result of the visual summary action. Source code in automator/ui_control/controller.py 141 142 143 144 145 146 147 148 def summary ( self , params : Dict [ str , str ]) -> str : \"\"\" Visual summary of the control element. :param params: The arguments of the visual summary method. should contain a key \"text\" with the text summary. :return: The result of the visual summary action. \"\"\" return params . get ( \"text\" )","title":"summary"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.texts","text":"Get the text of the control element. Returns: str \u2013 The text of the control element. Source code in automator/ui_control/controller.py 217 218 219 220 221 222 def texts ( self ) -> str : \"\"\" Get the text of the control element. :return: The text of the control element. \"\"\" return self . control . texts ()","title":"texts"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.transform_point","text":"Transform the relative coordinates to the absolute coordinates. Parameters: fraction_x ( float ) \u2013 The relative x coordinate. fraction_y ( float ) \u2013 The relative y coordinate. Returns: Tuple [ int , int ] \u2013 The absolute coordinates. Source code in automator/ui_control/controller.py 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 def transform_point ( self , fraction_x : float , fraction_y : float ) -> Tuple [ int , int ]: \"\"\" Transform the relative coordinates to the absolute coordinates. :param fraction_x: The relative x coordinate. :param fraction_y: The relative y coordinate. :return: The absolute coordinates. \"\"\" application_rect : RECT = self . application . rectangle () application_x = application_rect . left application_y = application_rect . top application_width = application_rect . width () application_height = application_rect . height () x = application_x + int ( application_width * fraction_x ) y = application_y + int ( application_height * fraction_y ) return x , y","title":"transform_point"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wait_enabled","text":"Wait until the control is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 256 257 258 259 260 261 262 263 264 265 266 267 def wait_enabled ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the control is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_enabled (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not enabled.\" ) break","title":"wait_enabled"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wait_visible","text":"Wait until the window is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 269 270 271 272 273 274 275 276 277 278 279 280 def wait_visible ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the window is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_visible (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not visible.\" ) break","title":"wait_visible"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wheel_mouse_input","text":"Wheel mouse input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the wheel mouse input method. Returns: \u2013 The result of the wheel mouse input action. Source code in automator/ui_control/controller.py 224 225 226 227 228 229 230 def wheel_mouse_input ( self , params : Dict [ str , str ]): \"\"\" Wheel mouse input on the control element. :param params: The arguments of the wheel mouse input method. :return: The result of the wheel mouse input action. \"\"\" return self . atomic_execution ( \"wheel_mouse_input\" , params )","title":"wheel_mouse_input"},{"location":"automator/ui_automator/#command","text":"The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class: @ControlReceiver.register class ClickInputCommand(ControlCommand): \"\"\" The click input command class. \"\"\" def execute(self) -> str: \"\"\" Execute the click input command. :return: The result of the click input command. \"\"\" return self.receiver.click_input(self.params) @classmethod def name(cls) -> str: \"\"\" Get the name of the atomic command. :return: The name of the atomic command. \"\"\" return \"click_input\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator. Below is the list of available commands in the UI Automator that are currently supported by UFO: Command Name Function Name Description ClickInputCommand click_input Click the control item with the mouse. ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window. DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window. SetEditTextCommand set_edit_text Add new text to the control item. GetTextsCommand texts Get the text of the control item. WheelMouseInputCommand wheel_mouse_input Scroll the control item. KeyboardInputCommand keyboard_input Simulate the keyboard input. Tip Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator. Tip You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.","title":"Command"},{"location":"automator/web_automator/","text":"Web Automator We also support the use of the Web Automator to get the content of a web page. The Web Automator is implemented in ufo/autoamtor/app_apis/web module. Configuration There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only msedge.exe and chrome.exe are currently supported by the Web Automator. Receiver The Web Automator receiver is the WebReceiver class defined in the ufo/automator/app_apis/web/webclient.py module: Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the Web COM client. Source code in automator/app_apis/web/webclient.py 21 22 23 24 25 26 27 def __init__ ( self ) -> None : \"\"\" Initialize the Web COM client. \"\"\" self . _headers = { \"User-Agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3\" } web_crawler ( url , ignore_link ) Run the crawler with various options. Parameters: url ( str ) \u2013 The URL of the webpage. ignore_link ( bool ) \u2013 Whether to ignore the links. Returns: str \u2013 The result markdown content. Source code in automator/app_apis/web/webclient.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 def web_crawler ( self , url : str , ignore_link : bool ) -> str : \"\"\" Run the crawler with various options. :param url: The URL of the webpage. :param ignore_link: Whether to ignore the links. :return: The result markdown content. \"\"\" try : # Get the HTML content of the webpage response = requests . get ( url , headers = self . _headers ) response . raise_for_status () html_content = response . text # Convert the HTML content to markdown h = html2text . HTML2Text () h . ignore_links = ignore_link markdown_content = h . handle ( html_content ) return markdown_content except requests . RequestException as e : print ( f \"Error fetching the URL: { e } \" ) return f \"Error fetching the URL: { e } \" Command We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator. @WebReceiver.register class WebCrawlerCommand(WebCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.web_crawler( url=self.params.get(\"url\"), ignore_link=self.params.get(\"ignore_link\", False), ) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"web_crawler\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description WebCrawlerCommand web_crawler Get the content of a web page into a markdown format. Tip Please refer to the ufo/prompts/apps/web/api.yaml file for the prompt details for the WebCrawlerCommand command.","title":"Web Automator"},{"location":"automator/web_automator/#web-automator","text":"We also support the use of the Web Automator to get the content of a web page. The Web Automator is implemented in ufo/autoamtor/app_apis/web module.","title":"Web Automator"},{"location":"automator/web_automator/#configuration","text":"There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only msedge.exe and chrome.exe are currently supported by the Web Automator.","title":"Configuration"},{"location":"automator/web_automator/#receiver","text":"The Web Automator receiver is the WebReceiver class defined in the ufo/automator/app_apis/web/webclient.py module: Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the Web COM client. Source code in automator/app_apis/web/webclient.py 21 22 23 24 25 26 27 def __init__ ( self ) -> None : \"\"\" Initialize the Web COM client. \"\"\" self . _headers = { \"User-Agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3\" }","title":"Receiver"},{"location":"automator/web_automator/#automator.app_apis.web.webclient.WebReceiver.web_crawler","text":"Run the crawler with various options. Parameters: url ( str ) \u2013 The URL of the webpage. ignore_link ( bool ) \u2013 Whether to ignore the links. Returns: str \u2013 The result markdown content. Source code in automator/app_apis/web/webclient.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 def web_crawler ( self , url : str , ignore_link : bool ) -> str : \"\"\" Run the crawler with various options. :param url: The URL of the webpage. :param ignore_link: Whether to ignore the links. :return: The result markdown content. \"\"\" try : # Get the HTML content of the webpage response = requests . get ( url , headers = self . _headers ) response . raise_for_status () html_content = response . text # Convert the HTML content to markdown h = html2text . HTML2Text () h . ignore_links = ignore_link markdown_content = h . handle ( html_content ) return markdown_content except requests . RequestException as e : print ( f \"Error fetching the URL: { e } \" ) return f \"Error fetching the URL: { e } \"","title":"web_crawler"},{"location":"automator/web_automator/#command","text":"We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator. @WebReceiver.register class WebCrawlerCommand(WebCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.web_crawler( url=self.params.get(\"url\"), ignore_link=self.params.get(\"ignore_link\", False), ) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"web_crawler\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description WebCrawlerCommand web_crawler Get the content of a web page into a markdown format. Tip Please refer to the ufo/prompts/apps/web/api.yaml file for the prompt details for the WebCrawlerCommand command.","title":"Command"},{"location":"automator/wincom_automator/","text":"API Automator UFO currently support the use of Win32 API API automator to interact with the application's native API. We implement them in python using the pywin32 library. The API automator now supports Word and Excel applications, and we are working on extending the support to other applications. Configuration There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only WINWORD.EXE and EXCEL.EXE are currently supported by the API Automator. Receiver The base class for the receiver of the API Automator is the WinCOMReceiverBasic class defined in the ufo/automator/app_apis/basic module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic class: Bases: ReceiverBasic The base class for Windows COM client. Initialize the Windows COM client. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. clsid ( str ) \u2013 The CLSID of the COM object. Source code in automator/app_apis/basic.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def __init__ ( self , app_root_name : str , process_name : str , clsid : str ) -> None : \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self . app_root_name = app_root_name self . process_name = process_name self . clsid = clsid self . client = win32com . client . Dispatch ( self . clsid ) self . com_object = self . get_object_from_process_name () full_path : str property Get the full path of the process. Returns: str \u2013 The full path of the process. app_match ( object_name_list ) Check if the process name matches the app root. Parameters: object_name_list ( List [ str ] ) \u2013 The list of object name. Returns: str \u2013 The matched object name. Source code in automator/app_apis/basic.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def app_match ( self , object_name_list : List [ str ]) -> str : \"\"\" Check if the process name matches the app root. :param object_name_list: The list of object name. :return: The matched object name. \"\"\" suffix = self . get_suffix_mapping () if self . process_name . endswith ( suffix ): clean_process_name = self . process_name [: - len ( suffix )] else : clean_process_name = self . process_name if not object_name_list : return \"\" return max ( object_name_list , key = lambda x : self . longest_common_substring_length ( clean_process_name , x ), ) close () Close the app. Source code in automator/app_apis/basic.py 110 111 112 113 114 115 116 117 def close ( self ) -> None : \"\"\" Close the app. \"\"\" try : self . com_object . Close () except : pass get_object_from_process_name () abstractmethod Get the object from the process name. Source code in automator/app_apis/basic.py 36 37 38 39 40 41 @abstractmethod def get_object_from_process_name ( self ) -> win32com . client . CDispatch : \"\"\" Get the object from the process name. \"\"\" pass get_suffix_mapping () Get the suffix mapping. Returns: Dict [ str , str ] \u2013 The suffix mapping. Source code in automator/app_apis/basic.py 43 44 45 46 47 48 49 50 51 52 53 54 55 def get_suffix_mapping ( self ) -> Dict [ str , str ]: \"\"\" Get the suffix mapping. :return: The suffix mapping. \"\"\" suffix_mapping = { \"WINWORD.EXE\" : \"docx\" , \"EXCEL.EXE\" : \"xlsx\" , \"POWERPNT.EXE\" : \"pptx\" , \"olk.exe\" : \"msg\" , } return suffix_mapping . get ( self . app_root_name , None ) longest_common_substring_length ( str1 , str2 ) staticmethod Get the longest common substring of two strings. Parameters: str1 ( str ) \u2013 The first string. str2 ( str ) \u2013 The second string. Returns: int \u2013 The length of the longest common substring. Source code in automator/app_apis/basic.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 @staticmethod def longest_common_substring_length ( str1 : str , str2 : str ) -> int : \"\"\" Get the longest common substring of two strings. :param str1: The first string. :param str2: The second string. :return: The length of the longest common substring. \"\"\" m = len ( str1 ) n = len ( str2 ) dp = [[ 0 ] * ( n + 1 ) for _ in range ( m + 1 )] max_length = 0 for i in range ( 1 , m + 1 ): for j in range ( 1 , n + 1 ): if str1 [ i - 1 ] == str2 [ j - 1 ]: dp [ i ][ j ] = dp [ i - 1 ][ j - 1 ] + 1 if dp [ i ][ j ] > max_length : max_length = dp [ i ][ j ] else : dp [ i ][ j ] = 0 return max_length save () Save the current state of the app. Source code in automator/app_apis/basic.py 91 92 93 94 95 96 97 98 def save ( self ) -> None : \"\"\" Save the current state of the app. \"\"\" try : self . com_object . Save () except : pass save_to_xml ( file_path ) Save the current state of the app to XML. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/app_apis/basic.py 100 101 102 103 104 105 106 107 108 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. :param file_path: The file path to save the XML. \"\"\" try : self . com_object . SaveAs ( file_path , self . xml_format_code ) except : pass The receiver of Word and Excel applications inherit from the WinCOMReceiverBasic class. The WordReceiver and ExcelReceiver classes are defined in the ufo/automator/app_apis/word and ufo/automator/app_apis/excel modules, respectively: Command The command of the API Automator for the Word and Excel applications in located in the client module in the ufo/automator/app_apis/{app_name} folder inheriting from the WinCOMCommand class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand class that inherits from the SelectTextCommand class: @WordWinCOMReceiver.register class SelectTextCommand(WinCOMCommand): \"\"\" The command to select text. \"\"\" def execute(self): \"\"\" Execute the command to select text. :return: The selected text. \"\"\" return self.receiver.select_text(self.params.get(\"text\")) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"select_text\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a concrete WinCOMReceiver to be executed using the register decorator. Below is the list of available commands in the API Automator that are currently supported by UFO: Word API Commands Command Name Function Name Description InsertTableCommand insert_table Insert a table to a Word document. SelectTextCommand select_text Select the text in a Word document. SelectTableCommand select_table Select a table in a Word document. Excel API Commands Command Name Function Name Description GetSheetContentCommand get_sheet_content Get the content of a sheet in the Excel app. Table2MarkdownCommand table2markdown Convert the table content in a sheet of the Excel app to markdown format. InsertExcelTableCommand insert_excel_table Insert a table to the Excel sheet. Tip Please refer to the ufo/prompts/apps/{app_name}/api.yaml file for the prompt details for the commands. Tip You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/ module.","title":"API Automator"},{"location":"automator/wincom_automator/#api-automator","text":"UFO currently support the use of Win32 API API automator to interact with the application's native API. We implement them in python using the pywin32 library. The API automator now supports Word and Excel applications, and we are working on extending the support to other applications.","title":"API Automator"},{"location":"automator/wincom_automator/#configuration","text":"There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only WINWORD.EXE and EXCEL.EXE are currently supported by the API Automator.","title":"Configuration"},{"location":"automator/wincom_automator/#receiver","text":"The base class for the receiver of the API Automator is the WinCOMReceiverBasic class defined in the ufo/automator/app_apis/basic module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic class: Bases: ReceiverBasic The base class for Windows COM client. Initialize the Windows COM client. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. clsid ( str ) \u2013 The CLSID of the COM object. Source code in automator/app_apis/basic.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def __init__ ( self , app_root_name : str , process_name : str , clsid : str ) -> None : \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self . app_root_name = app_root_name self . process_name = process_name self . clsid = clsid self . client = win32com . client . Dispatch ( self . clsid ) self . com_object = self . get_object_from_process_name ()","title":"Receiver"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.full_path","text":"Get the full path of the process. Returns: str \u2013 The full path of the process.","title":"full_path"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.app_match","text":"Check if the process name matches the app root. Parameters: object_name_list ( List [ str ] ) \u2013 The list of object name. Returns: str \u2013 The matched object name. Source code in automator/app_apis/basic.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def app_match ( self , object_name_list : List [ str ]) -> str : \"\"\" Check if the process name matches the app root. :param object_name_list: The list of object name. :return: The matched object name. \"\"\" suffix = self . get_suffix_mapping () if self . process_name . endswith ( suffix ): clean_process_name = self . process_name [: - len ( suffix )] else : clean_process_name = self . process_name if not object_name_list : return \"\" return max ( object_name_list , key = lambda x : self . longest_common_substring_length ( clean_process_name , x ), )","title":"app_match"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.close","text":"Close the app. Source code in automator/app_apis/basic.py 110 111 112 113 114 115 116 117 def close ( self ) -> None : \"\"\" Close the app. \"\"\" try : self . com_object . Close () except : pass","title":"close"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.get_object_from_process_name","text":"Get the object from the process name. Source code in automator/app_apis/basic.py 36 37 38 39 40 41 @abstractmethod def get_object_from_process_name ( self ) -> win32com . client . CDispatch : \"\"\" Get the object from the process name. \"\"\" pass","title":"get_object_from_process_name"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.get_suffix_mapping","text":"Get the suffix mapping. Returns: Dict [ str , str ] \u2013 The suffix mapping. Source code in automator/app_apis/basic.py 43 44 45 46 47 48 49 50 51 52 53 54 55 def get_suffix_mapping ( self ) -> Dict [ str , str ]: \"\"\" Get the suffix mapping. :return: The suffix mapping. \"\"\" suffix_mapping = { \"WINWORD.EXE\" : \"docx\" , \"EXCEL.EXE\" : \"xlsx\" , \"POWERPNT.EXE\" : \"pptx\" , \"olk.exe\" : \"msg\" , } return suffix_mapping . get ( self . app_root_name , None )","title":"get_suffix_mapping"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.longest_common_substring_length","text":"Get the longest common substring of two strings. Parameters: str1 ( str ) \u2013 The first string. str2 ( str ) \u2013 The second string. Returns: int \u2013 The length of the longest common substring. Source code in automator/app_apis/basic.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 @staticmethod def longest_common_substring_length ( str1 : str , str2 : str ) -> int : \"\"\" Get the longest common substring of two strings. :param str1: The first string. :param str2: The second string. :return: The length of the longest common substring. \"\"\" m = len ( str1 ) n = len ( str2 ) dp = [[ 0 ] * ( n + 1 ) for _ in range ( m + 1 )] max_length = 0 for i in range ( 1 , m + 1 ): for j in range ( 1 , n + 1 ): if str1 [ i - 1 ] == str2 [ j - 1 ]: dp [ i ][ j ] = dp [ i - 1 ][ j - 1 ] + 1 if dp [ i ][ j ] > max_length : max_length = dp [ i ][ j ] else : dp [ i ][ j ] = 0 return max_length","title":"longest_common_substring_length"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.save","text":"Save the current state of the app. Source code in automator/app_apis/basic.py 91 92 93 94 95 96 97 98 def save ( self ) -> None : \"\"\" Save the current state of the app. \"\"\" try : self . com_object . Save () except : pass","title":"save"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.save_to_xml","text":"Save the current state of the app to XML. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/app_apis/basic.py 100 101 102 103 104 105 106 107 108 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. :param file_path: The file path to save the XML. \"\"\" try : self . com_object . SaveAs ( file_path , self . xml_format_code ) except : pass The receiver of Word and Excel applications inherit from the WinCOMReceiverBasic class. The WordReceiver and ExcelReceiver classes are defined in the ufo/automator/app_apis/word and ufo/automator/app_apis/excel modules, respectively:","title":"save_to_xml"},{"location":"automator/wincom_automator/#command","text":"The command of the API Automator for the Word and Excel applications in located in the client module in the ufo/automator/app_apis/{app_name} folder inheriting from the WinCOMCommand class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand class that inherits from the SelectTextCommand class: @WordWinCOMReceiver.register class SelectTextCommand(WinCOMCommand): \"\"\" The command to select text. \"\"\" def execute(self): \"\"\" Execute the command to select text. :return: The selected text. \"\"\" return self.receiver.select_text(self.params.get(\"text\")) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"select_text\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a concrete WinCOMReceiver to be executed using the register decorator. Below is the list of available commands in the API Automator that are currently supported by UFO:","title":"Command"},{"location":"automator/wincom_automator/#word-api-commands","text":"Command Name Function Name Description InsertTableCommand insert_table Insert a table to a Word document. SelectTextCommand select_text Select the text in a Word document. SelectTableCommand select_table Select a table in a Word document.","title":"Word API Commands"},{"location":"automator/wincom_automator/#excel-api-commands","text":"Command Name Function Name Description GetSheetContentCommand get_sheet_content Get the content of a sheet in the Excel app. Table2MarkdownCommand table2markdown Convert the table content in a sheet of the Excel app to markdown format. InsertExcelTableCommand insert_excel_table Insert a table to the Excel sheet. Tip Please refer to the ufo/prompts/apps/{app_name}/api.yaml file for the prompt details for the commands. Tip You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/ module.","title":"Excel API Commands"},{"location":"configurations/developer_configuration/","text":"Developer Configuration This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml is located in the ufo/config directory and contains various settings and switches to customize the UFO agent for development purposes. System Configuration The following parameters are included in the system configuration of the UFO agent: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" MAX_STEP The maximum step limit for completing the user request in a session. Integer 100 SLEEP_TIME The sleep time in seconds between each step to wait for the window to be ready. Integer 5 RECTANGLE_TIME The time in seconds for the rectangle display around the selected control. Integer 1 SAFE_GUARD Whether to use the safe guard to ask for user confirmation before performing sensitive operations. Boolean True CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] HISTORY_KEYS The keys of the step history added to the Blackboard for agent decision-making. List [\"Step\", \"Thought\", \"ControlText\", \"Subtask\", \"Action\", \"Comment\", \"Results\", \"UserConfirm\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} PRINT_LOG Whether to print the log in the console. Boolean False CONCAT_SCREENSHOT Whether to concatenate the screenshots into a single image for the LLM input. Boolean False INCLUDE_LAST_SCREENSHOT Whether to include the screenshot from the last step in the observation. Boolean True LOG_LEVEL The log level for the UFO agent. String \"DEBUG\" REQUEST_TIMEOUT The call timeout in seconds for the LLM model. Integer 250 USE_APIS Whether to allow the use of application APIs. Boolean True LOG_XML Whether to log the XML file at every step. Boolean False SCREENSHOT_TO_MEMORY Whether to allow the screenshot to Blackboard for the agent's decision making. Boolean True SAVE_UI_TREE Whether to save the UI tree in the log. Boolean False Main Prompt Configuration Main Prompt Templates The main prompt templates include the prompts in the UFO agent for both system and user roles. Configuration Option Description Type Default Value HOSTAGENT_PROMPT The main prompt template for the HostAgent . String \"ufo/prompts/share/base/host_agent.yaml\" APPAGENT_PROMPT The main prompt template for the AppAgent . String \"ufo/prompts/share/base/app_agent.yaml\" FOLLOWERAGENT_PROMPT The main prompt template for the FollowerAgent . String \"ufo/prompts/share/base/app_agent.yaml\" EVALUATION_PROMPT The prompt template for the evaluation. String \"ufo/prompts/evaluation/evaluate.yaml\" Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite directory to reduce the input size for specific token limits. Example Prompt Templates Example prompt templates are used for demonstration purposes in the UFO agent. Configuration Option Description Type Default Value HOSTAGENT_EXAMPLE_PROMPT The example prompt template for the HostAgent used for demonstration. String \"ufo/prompts/examples/{mode}/host_agent_example.yaml\" APPAGENT_EXAMPLE_PROMPT The example prompt template for the AppAgent used for demonstration. String \"ufo/prompts/examples/{mode}/app_agent_example.yaml\" Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode} directory to reduce the input size for demonstration purposes. Experience and Demonstration Learning These configuration parameters are used for experience and demonstration learning in the UFO agent. Configuration Option Description Type Default Value EXPERIENCE_PROMPT The prompt for self-experience learning. String \"ufo/prompts/experience/experience_summary.yaml\" EXPERIENCE_SAVED_PATH The path to save the experience learning data. String \"vectordb/experience/\" DEMONSTRATION_PROMPT The prompt for user demonstration learning. String \"ufo/prompts/demonstration/demonstration_summary.yaml\" DEMONSTRATION_SAVED_PATH The path to save the demonstration learning data. String \"vectordb/demonstration/\" Application API Configuration These prompt configuration parameters are used for the application and control APIs in the UFO agent. Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} pywinauto Configuration The API configuration parameters are used for the pywinauto API in the UFO agent. Configuration Option Description Type Default Value CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False Control Filtering The control filtering configuration parameters are used for control filtering in the agent's observation. Configuration Option Description Type Default Value CONTROL_FILTER The control filter type, can be TEXT , SEMANTIC , or ICON . List [] CONTROL_FILTER_TOP_K_PLAN The control filter effect on top k plans from the agent. Integer 2 CONTROL_FILTER_TOP_K_SEMANTIC The control filter top k for semantic similarity. Integer 15 CONTROL_FILTER_TOP_K_ICON The control filter top k for icon similarity. Integer 15 CONTROL_FILTER_MODEL_SEMANTIC_NAME The control filter model name for semantic similarity. String \"all-MiniLM-L6-v2\" CONTROL_FILTER_MODEL_ICON_NAME The control filter model name for icon similarity. String \"clip-ViT-B-32\" Customizations The customization configuration parameters are used for customizations in the UFO agent. Configuration Option Description Type Default Value ASK_QUESTION Whether to ask the user for a question. Boolean True USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20 Evaluation The evaluation configuration parameters are used for the evaluation in the UFO agent. Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True You can customize the configuration parameters in the config_dev.yaml file to suit your development needs and enhance the functionality of the UFO agent.","title":"Developer Configuration"},{"location":"configurations/developer_configuration/#developer-configuration","text":"This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml is located in the ufo/config directory and contains various settings and switches to customize the UFO agent for development purposes.","title":"Developer Configuration"},{"location":"configurations/developer_configuration/#system-configuration","text":"The following parameters are included in the system configuration of the UFO agent: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" MAX_STEP The maximum step limit for completing the user request in a session. Integer 100 SLEEP_TIME The sleep time in seconds between each step to wait for the window to be ready. Integer 5 RECTANGLE_TIME The time in seconds for the rectangle display around the selected control. Integer 1 SAFE_GUARD Whether to use the safe guard to ask for user confirmation before performing sensitive operations. Boolean True CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] HISTORY_KEYS The keys of the step history added to the Blackboard for agent decision-making. List [\"Step\", \"Thought\", \"ControlText\", \"Subtask\", \"Action\", \"Comment\", \"Results\", \"UserConfirm\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} PRINT_LOG Whether to print the log in the console. Boolean False CONCAT_SCREENSHOT Whether to concatenate the screenshots into a single image for the LLM input. Boolean False INCLUDE_LAST_SCREENSHOT Whether to include the screenshot from the last step in the observation. Boolean True LOG_LEVEL The log level for the UFO agent. String \"DEBUG\" REQUEST_TIMEOUT The call timeout in seconds for the LLM model. Integer 250 USE_APIS Whether to allow the use of application APIs. Boolean True LOG_XML Whether to log the XML file at every step. Boolean False SCREENSHOT_TO_MEMORY Whether to allow the screenshot to Blackboard for the agent's decision making. Boolean True SAVE_UI_TREE Whether to save the UI tree in the log. Boolean False","title":"System Configuration"},{"location":"configurations/developer_configuration/#main-prompt-configuration","text":"","title":"Main Prompt Configuration"},{"location":"configurations/developer_configuration/#main-prompt-templates","text":"The main prompt templates include the prompts in the UFO agent for both system and user roles. Configuration Option Description Type Default Value HOSTAGENT_PROMPT The main prompt template for the HostAgent . String \"ufo/prompts/share/base/host_agent.yaml\" APPAGENT_PROMPT The main prompt template for the AppAgent . String \"ufo/prompts/share/base/app_agent.yaml\" FOLLOWERAGENT_PROMPT The main prompt template for the FollowerAgent . String \"ufo/prompts/share/base/app_agent.yaml\" EVALUATION_PROMPT The prompt template for the evaluation. String \"ufo/prompts/evaluation/evaluate.yaml\" Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite directory to reduce the input size for specific token limits.","title":"Main Prompt Templates"},{"location":"configurations/developer_configuration/#example-prompt-templates","text":"Example prompt templates are used for demonstration purposes in the UFO agent. Configuration Option Description Type Default Value HOSTAGENT_EXAMPLE_PROMPT The example prompt template for the HostAgent used for demonstration. String \"ufo/prompts/examples/{mode}/host_agent_example.yaml\" APPAGENT_EXAMPLE_PROMPT The example prompt template for the AppAgent used for demonstration. String \"ufo/prompts/examples/{mode}/app_agent_example.yaml\" Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode} directory to reduce the input size for demonstration purposes.","title":"Example Prompt Templates"},{"location":"configurations/developer_configuration/#experience-and-demonstration-learning","text":"These configuration parameters are used for experience and demonstration learning in the UFO agent. Configuration Option Description Type Default Value EXPERIENCE_PROMPT The prompt for self-experience learning. String \"ufo/prompts/experience/experience_summary.yaml\" EXPERIENCE_SAVED_PATH The path to save the experience learning data. String \"vectordb/experience/\" DEMONSTRATION_PROMPT The prompt for user demonstration learning. String \"ufo/prompts/demonstration/demonstration_summary.yaml\" DEMONSTRATION_SAVED_PATH The path to save the demonstration learning data. String \"vectordb/demonstration/\"","title":"Experience and Demonstration Learning"},{"location":"configurations/developer_configuration/#application-api-configuration","text":"These prompt configuration parameters are used for the application and control APIs in the UFO agent. Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"}","title":"Application API Configuration"},{"location":"configurations/developer_configuration/#pywinauto-configuration","text":"The API configuration parameters are used for the pywinauto API in the UFO agent. Configuration Option Description Type Default Value CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False","title":"pywinauto Configuration"},{"location":"configurations/developer_configuration/#control-filtering","text":"The control filtering configuration parameters are used for control filtering in the agent's observation. Configuration Option Description Type Default Value CONTROL_FILTER The control filter type, can be TEXT , SEMANTIC , or ICON . List [] CONTROL_FILTER_TOP_K_PLAN The control filter effect on top k plans from the agent. Integer 2 CONTROL_FILTER_TOP_K_SEMANTIC The control filter top k for semantic similarity. Integer 15 CONTROL_FILTER_TOP_K_ICON The control filter top k for icon similarity. Integer 15 CONTROL_FILTER_MODEL_SEMANTIC_NAME The control filter model name for semantic similarity. String \"all-MiniLM-L6-v2\" CONTROL_FILTER_MODEL_ICON_NAME The control filter model name for icon similarity. String \"clip-ViT-B-32\"","title":"Control Filtering"},{"location":"configurations/developer_configuration/#customizations","text":"The customization configuration parameters are used for customizations in the UFO agent. Configuration Option Description Type Default Value ASK_QUESTION Whether to ask the user for a question. Boolean True USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Customizations"},{"location":"configurations/developer_configuration/#evaluation","text":"The evaluation configuration parameters are used for the evaluation in the UFO agent. Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True You can customize the configuration parameters in the config_dev.yaml file to suit your development needs and enhance the functionality of the UFO agent.","title":"Evaluation"},{"location":"configurations/pricing_configuration/","text":"Pricing Configuration We provide a configuration file pricing_config.yaml to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information. You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml file. Below is the default pricing configuration: # Prices in $ per 1000 tokens # Last updated: 2024-05-13 PRICES: { \"openai/gpt-4-0613\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-3.5-turbo-0613\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-vision-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-4-32k\": {\"input\": 0.06, \"output\": 0.12}, \"openai/gpt-4-turbo\": {\"input\":0.01,\"output\": 0.03}, \"openai/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"openai/gpt-4o-2024-05-13\": {\"input\": 0.005, \"output\": 0.015}, \"openai/gpt-3.5-turbo-0125\": {\"input\": 0.0005, \"output\": 0.0015}, \"openai/gpt-3.5-turbo-1106\": {\"input\": 0.001, \"output\": 0.002}, \"openai/gpt-3.5-turbo-instruct\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-3.5-turbo-16k-0613\": {\"input\": 0.003, \"output\": 0.004}, \"openai/whisper-1\": {\"input\": 0.006, \"output\": 0.006}, \"openai/tts-1\": {\"input\": 0.015, \"output\": 0.015}, \"openai/tts-hd-1\": {\"input\": 0.03, \"output\": 0.03}, \"openai/text-embedding-ada-002-v2\": {\"input\": 0.0001, \"output\": 0.0001}, \"openai/text-davinci:003\": {\"input\": 0.02, \"output\": 0.02}, \"openai/text-ada-001\": {\"input\": 0.0004, \"output\": 0.0004}, \"azure/gpt-35-turbo-20220309\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-20230613\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-16k-20230613\":{\"input\": 0.003, \"output\": 0.004}, \"azure/gpt-35-turbo-1106\":{\"input\": 0.001, \"output\": 0.002}, \"azure/gpt-4-20230321\":{\"input\": 0.03, \"output\": 0.06}, \"azure/gpt-4-32k-20230321\":{\"input\": 0.06, \"output\": 0.12}, \"azure/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-visual-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-turbo-20240409\": {\"input\":0.01,\"output\": 0.03}, \"azure/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"azure/gpt-4o-20240513\": {\"input\": 0.005, \"output\": 0.015}, \"qwen/qwen-vl-plus\": {\"input\": 0.008, \"output\": 0.008}, \"qwen/qwen-vl-max\": {\"input\": 0.02, \"output\": 0.02}, \"gemini/gemini-1.5-flash\": {\"input\": 0.00035, \"output\": 0.00105}, \"gemini/gemini-1.5-pro\": {\"input\": 0.0035, \"output\": 0.0105}, \"gemini/gemini-1.0-pro\": {\"input\": 0.0005, \"output\": 0.0015}, } Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.","title":"Model Pricing"},{"location":"configurations/pricing_configuration/#pricing-configuration","text":"We provide a configuration file pricing_config.yaml to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information. You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml file. Below is the default pricing configuration: # Prices in $ per 1000 tokens # Last updated: 2024-05-13 PRICES: { \"openai/gpt-4-0613\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-3.5-turbo-0613\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-vision-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-4-32k\": {\"input\": 0.06, \"output\": 0.12}, \"openai/gpt-4-turbo\": {\"input\":0.01,\"output\": 0.03}, \"openai/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"openai/gpt-4o-2024-05-13\": {\"input\": 0.005, \"output\": 0.015}, \"openai/gpt-3.5-turbo-0125\": {\"input\": 0.0005, \"output\": 0.0015}, \"openai/gpt-3.5-turbo-1106\": {\"input\": 0.001, \"output\": 0.002}, \"openai/gpt-3.5-turbo-instruct\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-3.5-turbo-16k-0613\": {\"input\": 0.003, \"output\": 0.004}, \"openai/whisper-1\": {\"input\": 0.006, \"output\": 0.006}, \"openai/tts-1\": {\"input\": 0.015, \"output\": 0.015}, \"openai/tts-hd-1\": {\"input\": 0.03, \"output\": 0.03}, \"openai/text-embedding-ada-002-v2\": {\"input\": 0.0001, \"output\": 0.0001}, \"openai/text-davinci:003\": {\"input\": 0.02, \"output\": 0.02}, \"openai/text-ada-001\": {\"input\": 0.0004, \"output\": 0.0004}, \"azure/gpt-35-turbo-20220309\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-20230613\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-16k-20230613\":{\"input\": 0.003, \"output\": 0.004}, \"azure/gpt-35-turbo-1106\":{\"input\": 0.001, \"output\": 0.002}, \"azure/gpt-4-20230321\":{\"input\": 0.03, \"output\": 0.06}, \"azure/gpt-4-32k-20230321\":{\"input\": 0.06, \"output\": 0.12}, \"azure/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-visual-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-turbo-20240409\": {\"input\":0.01,\"output\": 0.03}, \"azure/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"azure/gpt-4o-20240513\": {\"input\": 0.005, \"output\": 0.015}, \"qwen/qwen-vl-plus\": {\"input\": 0.008, \"output\": 0.008}, \"qwen/qwen-vl-max\": {\"input\": 0.02, \"output\": 0.02}, \"gemini/gemini-1.5-flash\": {\"input\": 0.00035, \"output\": 0.00105}, \"gemini/gemini-1.5-pro\": {\"input\": 0.0035, \"output\": 0.0105}, \"gemini/gemini-1.0-pro\": {\"input\": 0.0005, \"output\": 0.0015}, } Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.","title":"Pricing Configuration"},{"location":"configurations/user_configuration/","text":"User Configuration An overview of the user configuration options available in UFO. You need to rename the config.yaml.template in the folder ufo/config to config.yaml to configure the LLMs and other custom settings. LLM Configuration You can configure the LLMs for the HOST_AGENT and APP_AGENT separately in the config.yaml file. The FollowerAgent and EvaluationAgent share the same LLM configuration as the APP_AGENT . Additionally, you can configure a backup LLM engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models section of the documentation. Configuration Option Description Type Default Value VISUAL_MODE Whether to use visual mode to understand screenshots and take actions Boolean True API_TYPE The API type: \"openai\" for the OpenAI API, \"aoai\" for the AOAI API. String \"openai\" API_BASE The API endpoint for the LLM String \"https://api.openai.com/v1/chat/completions\" API_KEY The API key for the LLM String \"sk-\" API_VERSION The version of the API String \"2024-02-15-preview\" API_MODEL The LLM model name String \"gpt-4-vision-preview\" For Azure OpenAI (AOAI) API The following additional configuration option is available for the AOAI API: Configuration Option Description Type Default Value API_DEPLOYMENT_ID The deployment ID, only available for the AOAI API String \"\" Ensure to fill in the necessary API details for both the HOST_AGENT and APP_AGENT to enable UFO to interact with the LLMs effectively. LLM Parameters You can also configure additional parameters for the LLMs in the config.yaml file: Configuration Option Description Type Default Value MAX_TOKENS The maximum token limit for the response completion Integer 2000 MAX_RETRY The maximum retry limit for the response completion Integer 3 TEMPERATURE The temperature of the model: the lower the value, the more consistent the output of the model Float 0.0 TOP_P The top_p of the model: the lower the value, the more conservative the output of the model Float 0.0 TIMEOUT The call timeout in seconds Integer 60 For RAG Configuration to Enhance the UFO Agent You can configure the RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources: RAG Configuration for the Offline Docs Configure the following parameters to allow UFO to use offline documents for the decision-making process: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1 RAG Configuration for the Bing search Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1 RAG Configuration for experience Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG Configuration for demonstration Configure the following parameters to allow UFO to use the RAG from user demonstration: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use the RAG from its user demonstration Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.","title":"User Configuration"},{"location":"configurations/user_configuration/#user-configuration","text":"An overview of the user configuration options available in UFO. You need to rename the config.yaml.template in the folder ufo/config to config.yaml to configure the LLMs and other custom settings.","title":"User Configuration"},{"location":"configurations/user_configuration/#llm-configuration","text":"You can configure the LLMs for the HOST_AGENT and APP_AGENT separately in the config.yaml file. The FollowerAgent and EvaluationAgent share the same LLM configuration as the APP_AGENT . Additionally, you can configure a backup LLM engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models section of the documentation. Configuration Option Description Type Default Value VISUAL_MODE Whether to use visual mode to understand screenshots and take actions Boolean True API_TYPE The API type: \"openai\" for the OpenAI API, \"aoai\" for the AOAI API. String \"openai\" API_BASE The API endpoint for the LLM String \"https://api.openai.com/v1/chat/completions\" API_KEY The API key for the LLM String \"sk-\" API_VERSION The version of the API String \"2024-02-15-preview\" API_MODEL The LLM model name String \"gpt-4-vision-preview\"","title":"LLM Configuration"},{"location":"configurations/user_configuration/#for-azure-openai-aoai-api","text":"The following additional configuration option is available for the AOAI API: Configuration Option Description Type Default Value API_DEPLOYMENT_ID The deployment ID, only available for the AOAI API String \"\" Ensure to fill in the necessary API details for both the HOST_AGENT and APP_AGENT to enable UFO to interact with the LLMs effectively.","title":"For Azure OpenAI (AOAI) API"},{"location":"configurations/user_configuration/#llm-parameters","text":"You can also configure additional parameters for the LLMs in the config.yaml file: Configuration Option Description Type Default Value MAX_TOKENS The maximum token limit for the response completion Integer 2000 MAX_RETRY The maximum retry limit for the response completion Integer 3 TEMPERATURE The temperature of the model: the lower the value, the more consistent the output of the model Float 0.0 TOP_P The top_p of the model: the lower the value, the more conservative the output of the model Float 0.0 TIMEOUT The call timeout in seconds Integer 60","title":"LLM Parameters"},{"location":"configurations/user_configuration/#for-rag-configuration-to-enhance-the-ufo-agent","text":"You can configure the RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources:","title":"For RAG Configuration to Enhance the UFO Agent"},{"location":"configurations/user_configuration/#rag-configuration-for-the-offline-docs","text":"Configure the following parameters to allow UFO to use offline documents for the decision-making process: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1","title":"RAG Configuration for the Offline Docs"},{"location":"configurations/user_configuration/#rag-configuration-for-the-bing-search","text":"Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1","title":"RAG Configuration for the Bing search"},{"location":"configurations/user_configuration/#rag-configuration-for-experience","text":"Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5","title":"RAG Configuration for experience"},{"location":"configurations/user_configuration/#rag-configuration-for-demonstration","text":"Configure the following parameters to allow UFO to use the RAG from user demonstration: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use the RAG from its user demonstration Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.","title":"RAG Configuration for demonstration"},{"location":"creating_app_agent/demonstration_provision/","text":"Provide Human Demonstrations to the AppAgent Users or application developers can provide human demonstrations to the AppAgent to guide it in executing similar tasks in the future. The AppAgent uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application. How to Prepare Human Demonstrations for the AppAgent? Currently, UFO supports learning from user trajectories recorded by Steps Recorder integrated within Windows. More tools will be supported in the future. Step 1: Recording User Demonstrations Follow the official guidance to use Steps Recorder to record user demonstrations. Step 2: Add Additional Information or Comments as Needed Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well. Step 3: Review and Save the Recorded Demonstrations Review the recorded steps and save them to a ZIP file. Refer to the sample_record.zip for an example of recorded steps for a specific request, such as \"sending an email to example@gmail.com to say hi.\" Step 4: Create an Action Trajectory Indexer Once you have your demonstration record ZIP file ready, you can parse it as an example to support RAG for UFO. Follow these steps: # Assume you are in the cloned UFO folder python -m record_processor -r \"To use the Azure OpenAI API, you need to create an account on the Azure OpenAI website. After creating an account, you can deploy the AOAI API and access the API key.
+After obtaining the API key, you can configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "aoai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
+API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY", # The aoai API key
+API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+If you want to use AAD for authentication, you should also set the following configuration:
+ AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
+ AAD_API_SCOPE: "YOUR_SCOPE", # Set the value to your scope for the llm model
+ AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE
+
+Tip
+If you set VISUAL_MODE
to True
, make sure the API_DEPLOYMENT_ID
supports visual inputs.
After configuring the HOST_AGENT
and APP_AGENT
with the OpenAI API, you can start using UFO to interact with the AOAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
To use the Claude API, you need to create an account on the Claude website and access the API key.
+You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command:
+pip install -U anthropic==0.37.1
+
+Configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the Claude API. The following is an example configuration for the Claude API:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Claude" ,
+API_KEY: "YOUR_KEY",
+API_MODEL: "YOUR_MODEL"
+
+Tip
+If you set VISUAL_MODE
to True
, make sure the API_MODEL
supports visual inputs.
Tip
+API_MODEL
is the model name of Claude LLM API. You can find the model name in the Claude LLM model list.
After configuring the HOST_AGENT
and APP_AGENT
with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
We support and welcome the integration of custom LLM models in UFO. If you have a custom LLM model that you would like to use with UFO, you can follow the steps below to configure the model in UFO.
+Create a custom LLM model and serve it on your local environment.
+Create a python script under the ufo/llm
directory, and implement your own LLM model class by inheriting the BaseService
class in the ufo/llm/base.py
file. We leave a PlaceHolderService
class in the ufo/llm/placeholder.py
file as an example. You must implement the chat_completion
method in your LLM model class to accept a list of messages and return a list of completions for each message.
def chat_completion(
+ self,
+ messages,
+ n,
+ temperature: Optional[float] = None,
+ max_tokens: Optional[int] = None,
+ top_p: Optional[float] = None,
+ **kwargs: Any,
+):
+ """
+ Generates completions for a given list of messages.
+ Args:
+ messages (List[str]): The list of messages to generate completions for.
+ n (int): The number of completions to generate for each message.
+ temperature (float, optional): Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used.
+ max_tokens (int, optional): The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used.
+ top_p (float, optional): Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used.
+ **kwargs: Additional keyword arguments to be passed to the underlying completion method.
+ Returns:
+ List[str], None:A list of generated completions for each message and the cost set to be None.
+ Raises:
+ Exception: If an error occurs while making the API request.
+ """
+ pass
+
+After implementing the LLM model class, you can configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the custom LLM model. The following is an example configuration for the custom LLM model:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "custom_model" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
+API_BASE: "YOUR_ENDPOINT", # The custom LLM API address.
+API_MODEL: "YOUR_MODEL", # The custom LLM model name.
+
+After configuring the HOST_AGENT
and APP_AGENT
with the custom LLM model, you can start using UFO to interact with the custom LLM model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
To use the Google Gemini API, you need to create an account on the Google Gemini website and access the API key.
+You may need to install additional dependencies to use the Google Gemini API. You can install the dependencies using the following command:
+pip install -U google-generativeai==0.7.0
+
+Configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the Google Gemini API. The following is an example configuration for the Google Gemini API:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Gemini" ,
+API_KEY: "YOUR_KEY",
+API_MODEL: "YOUR_MODEL"
+
+Tip
+If you set VISUAL_MODE
to True
, make sure the API_MODEL
supports visual inputs.
Tip
+API_MODEL
is the model name of Gemini LLM API. You can find the model name in the Gemini LLM model list. If you meet the 429
Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.
After configuring the HOST_AGENT
and APP_AGENT
with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
If you want to use the Ollama model, Go to Ollama and follow the instructions to serve a LLM model on your local environment. We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.
+## Install ollama on Linux & WSL2
+curl https://ollama.ai/install.sh | sh
+## Run the serving
+ollama serve
+
+Open another terminal and run the following command to test the ollama model:
+ollama run YOUR_MODEL
+
+Info
+When serving LLMs via Ollama, it will by default start a server at http://localhost:11434
, which will later be used as the API base in config.yaml
.
After obtaining the API key, you can configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the Ollama API. The following is an example configuration for the Ollama API:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Ollama" ,
+API_BASE: "YOUR_ENDPOINT",
+API_MODEL: "YOUR_MODEL"
+
+Tip
+API_BASE
is the URL started in the Ollama LLM server and API_MODEL
is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model token limitations, you can use lite version of prompt to have a taste on UFO which can be configured in config_dev.yaml
.
Tip
+If you set VISUAL_MODE
to True
, make sure the API_MODEL
supports visual inputs.
After configuring the HOST_AGENT
and APP_AGENT
with the Ollama API, you can start using UFO to interact with the Ollama API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
To use the OpenAI API, you need to create an account on the OpenAI website. After creating an account, you can access the API key from the API keys page.
+After obtaining the API key, you can configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the OpenAI API. The following is an example configuration for the OpenAI API:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API.
+API_KEY: "sk-", # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview", # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
+
+Tip
+If you set VISUAL_MODE
to True
, make sure the API_MODEL
supports visual inputs. You can find the list of models here.
After configuring the HOST_AGENT
and APP_AGENT
with the OpenAI API, you can start using UFO to interact with the OpenAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.
UFO supports a variety of LLM models and APIs. You can customize the model and API used by the HOST_AGENT
and APP_AGENT
in the config.yaml
file. Additionally, you can configure a BACKUP_AGENT
to handle requests when the primary agent fails to respond.
Please refer to the following sections for more information on the supported models and APIs:
+LLMs | +Documentation | +
---|---|
OPENAI |
+OpenAI API | +
Azure OpenAI (AOAI) |
+Azure OpenAI API | +
Gemini |
+Gemini API | +
Claude |
+Claude API | +
QWEN |
+QWEN API | +
Ollama |
+Ollama API | +
Custom |
+Custom API | +
Info
+Each model is implemented as a separate class in the ufo/llm
directory, and uses the functions chat_completion
defined in the BaseService
class of the ufo/llm/base.py
file to obtain responses from the model.
Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use the Qwen model, Go to QWen and register an account and get the API key. More details can be found here (in Chinese).
+You may need to install additional dependencies to use the Qwen model. You can install the dependencies using the following command:
+pip install dashscope
+
+Configure the HOST_AGENT
and APP_AGENT
in the config.yaml
file (rename the config_template.yaml
file to config.yaml
) to use the Qwen model. The following is an example configuration for the Qwen model:
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+ API_TYPE: "qwen" , # The API type, "qwen" for the Qwen model.
+ API_KEY: "YOUR_KEY", # The Qwen API key
+ API_MODEL: "YOUR_MODEL" # The Qwen model name
+
+Tip
+If you set VISUAL_MODE
to True
, make sure the API_MODEL
supports visual inputs.
Tip
+API_MODEL
is the model name of Qwen LLM API. You can find the model name in the Qwen LLM model list.
After configuring the HOST_AGENT
and APP_AGENT
with the Qwen model, you can start using UFO to interact with the Qwen model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.