Skip to main content

YAML

There are many ways to configure your YAML doc to support maximal flexibility.

The basis of your YAML doc will most typically have a train object along with any other classes you want to configure

The Classes being configured like:

firestore: 
clientJson: "secrets/firebase_service_client.json"
databaseURL: "env:FIREBASE_DATABASE_URL"

drive:
clientJson: "secrets/drive_service_client.json"

email:
password: env:GMAIL_PASSWORD
email: env:GMAIL_EMAIL

huggingface:
token: env:HUGGINGFACE_API_KEY

note: CLI based commands will retrieve the YAML doc and merge any args into the root of the yaml doc and processed accordingly.

Example 0: Bespoke example with many settings

verbose: true
firestore:
clientJson: "secrets/firebase_service_client.json"
databaseURL: "env:FIREBASE_DATABASE_URL"

drive:
appType: "desktop"
clientJson: "secrets/drive_service_client.json"
scopes:
- "https://www.googleapis.com/auth/drive"
- "https://www.googleapis.com/auth/drive.metadata.readonly"

email:
password: env:GMAIL_PASSWORD
email: env:GMAIL_EMAIL

huggingface:
hfToken: env:HUGGINGFACE_API_KEY
baseModel: vilsonrodrigues/falcon-7b-instruct-sharded
trainedModel: karpathic/falcon-7b-instruct-tuned
deployToHf: true

train:
service: firestore
query:
filterCollectionWithMultipleWhereClauseWithLimit:
collection: "chat-state"
filterKey: ["type"]
filterData: ["customer-inquiry-bot"]
operation: ["=="]
limit: 5
input:
value: "chat.0.content"
output:
value: "chat.1.content"

Example 1: Training on a CSV with two columns (or an input and output column).

train:
path: ../shared.csv

Example 2: Specify Input and Output values in a CSV

train:
path: ../shared.csv - Default Path for Input and Output
inputValue: input - Attribute to extract from path
outpuValue: output

Example 3: Attribute to extract from path

train: 
inputPath: ../input.csv
outputPath: ../output.csv
inputValue: input
outpuValue: output

Example 4: Attribute to extraxt path using input and output objects

train:
input:
path: ../input.csv
value: colname
output:
path: ../output.csv
value: colname

Example 5: Specifying default path for Input and Output

train:
path: ../shared.csv
input:
value: colname
output:
value: colname

Querying for data from a service is denoted by the query attribute placed.

This may be placed as a base object, or nested within a 'input' or 'output' object.

The query value follows the schema

train:
service: 'serviceName'
query: serviceMethodName : {methodParameters}

Here's an example:

train:
service: 'firestore'
query:
filterCollectionWithMultipleWhereClauseWithLimit:
collection: "chat-state"
filterKey: []
filterData: []
operation: []
limit: 5
input:
value: "chat.0.content"
output:
value: "chat.1.content"

train:
service: 'gdrive'
query:
getFileByName:
filename: 'test123'
mimeType: 'application/msword'
directory: false
directoryId: false
input:
value: "input"
output:
value: "output"

To specify the model you want to train and where to host it:

huggingface:
hfToken: env:HUGGINGFACE_API_KEY
baseModel: vilsonrodrigues/falcon-7b-instruct-sharded
trainedModel: karpathic/falcon-7b-instruct-tuned
deployToHf: true

Here are multiple possible query configurations for the Firestore Service, but please not only 1 can be supported at a time.

firebase:
clientJson: "secrets/firebase_service_client.json"
databaseURL: "env:FIREBASE_DATABASE_URL"
query:
filterCollectionWithWhereClause:
collection: "organizations"
filterKey: "organizationId"
filterData: organizationId
operation: "=="
getDocInSubCollection:
collection1: "your_collection1_name"
doc1: "your_doc1_name"
collection2: "your_collection2_name"
doc2: "your_doc2_name"
getAllDocumentsInCollectionReference:
ref: "your_reference"
collection: "your_collection_name"
filterCollectionWithWhereClauseWithID:
collection: "your_collection_name"
filterKey: "your_filter_key"
filterData: "your_filter_data"
operation: "your_operation"
filterCollectionWithWhereClauseIncludeDocID:
collection: "your_collection_name"
filterKey: "your_filter_key"
filterData: "your_filter_data"
operation: "your_operation"
filterCollectionWithWhereClause:
collection: "your_collection_name"
filterKey: "your_filter_key"
filterData: "your_filter_data"
operation: "your_operation"
filterCollectionWithMultipleWhereClause:
collection: "your_collection_name"
filterKey: ["your_filter_key1", "your_filter_key2"]
filterData: ["your_filter_data1", "your_filter_data2"]
operation: ["your_operation1", "your_operation2"]
filterCollectionWithMultipleWhereClauseWithLimit:
collection: "your_collection_name"
filterKey: ["your_filter_key1", "your_filter_key2"]
filterData: ["your_filter_data1", "your_filter_data2"]
operation: ["your_operation1", "your_operation2"]
limit: your_limit
filterCollectionWithMultipleWhereClauseIncludeDocID:
collection: "your_collection_name"
filterKey: ["your_filter_key1", "your_filter_key2"]
filterData: ["your_filter_data1", "your_filter_data2"]
operation: ["your_operation1", "your_operation2"]
filterSubCollectionWithMultipleWhereClauseIncludeDocID:
collection1: "your_collection1_name"
doc1: "your_doc1_name"
collection2: "your_collection2_name"
filterKey: ["your_filter_key1", "your_filter_key2"]
filterData: ["your_filter_data1", "your_filter_data2"]
operation: ["your_operation1", "your_operation2"]

filterCollectionWithMultipleWhereClauseWithLimit:
collection: "applications"
filterKey: ["appType", "backgroundResponse"]
filterData: ["ai-email-assistant", true]
operation: ["==", "=="]
limit: 5

output:
filterCollectionWithWhereClause:
collection: "organizations"
filterKey: "organizationId"
filterData: organizationId
operation: "=="


filterCollectionWithWhereClauseIncludeDocID:
collection: collection
filterKey: "id"
filterData: userID
operation: "=="