Appearance
question:here is my adjusted code: ""from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM # Path to the locally downloaded GGUF model model_path = "C:/Users/bower/.ollama/models/blobs/alphamonarch-7B/alphamonarch-7B.Q6_K.gguf" model_file = "alphamonarch-7b.Q6_K.gguf" messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant”, “content”: “Well, I’m quite partial to a good squeeze of fresh lemon juice."}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] tokenizer_name = "mlabonne/AlphaMonarch-7B-GGUF" # Initialize the tokenizer from Hugging Face tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").input_ids streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained(model_path) outputs = model.generate(inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"])"" here is my terminal: ""PS C:Usersboweraugmentoolkit> & C:/Users/bower/AppData/Local/Programs/Python/Python311/python.exe "c:/Users/bower/Videos/AI Agent/model test/modeltest.py" Traceback (most recent call last): File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubutils_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagesrequestsmodels.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF/resolve/main/config.json The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersutilshub.py", line 398, in cached_file resolved_file = hf_hub_download( ^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubutils_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubfile_download.py", line 1261, in hf_hub_download metadata = get_hf_file_metadata( ^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubutils_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubfile_download.py", line 1667, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubfile_download.py", line 385, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubfile_download.py", line 409, in _request_wrapper hf_raise_for_status(response) File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packageshuggingface_hubutils_errors.py", line 315, in hf_raise_for_status raise EntryNotFoundError(message, response) from e huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-66032b16-63920963050d63e3691361e5;98fc3e1e-3b53-4387-9ab3-9d90faa099e4) Entry Not Found for url: https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF/resolve/main/config.json. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:UsersbowerVideosAI Agentmodel testmodeltest.py", line 15, in <module> tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersmodelsautotokenization_auto.py", line 782, in from_pretrained config = AutoConfig.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersmodelsautoconfiguration_auto.py", line 1111, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersconfiguration_utils.py", line 633, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersconfiguration_utils.py", line 688, in _get_config_dict resolved_config_file = cached_file( ^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformersutilshub.py", line 452, in cached_file raise EnvironmentError( OSError: mlabonne/AlphaMonarch-7B-GGUF does not appear to have a file named config.json. Checkout 'https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF/main' for available files. PS C:Usersboweraugmentoolkit> ""
answer:The error you're encountering indicates an issue with fetching the tokenizer configuration from the Hugging Face Hub. Specifically, the URL it tried to access (`https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF/resolve/main/config.json`) returned a 404 error, which means the required `config.json` file for `mlabonne/AlphaMonarch-7B-GGUF` could not be found on Hugging Face's servers. This could be due to several reasons: 1. **Incorrect Model or Tokenizer Name**: If the name `mlabonne/AlphaMonarch-7B-GGUF` was incorrectly specified or if there's been a typographical error. 2. **Private Model or Tokenizer**: If the model or tokenizer is set to private by its owner, it would not be accessible without proper authentication. 3. **Non-Existent Model or Tokenizer on Hugging Face**: If `mlabonne/AlphaMonarch-7B-GGUF` does not actually exist on Hugging Face, or there has been a deletion or a name change. # Immediate Steps for Resolution: - **Verify the Model/Tokenizer Name**: Double-check on Hugging Face to ensure that `mlabonne/AlphaMonarch-7B-GGUF` is the correct and complete name of the model or tokenizer you're trying to use. - **Check Model Accessibility**: If the model/tokenizer is indeed available on Hugging Face, ensure it is public and not listed as a private model, which would restrict access. - **Direct Loading of the Model**: Given you have a locally downloaded model file, you aim to load this directly. The issue arises with the tokenizer fetching process, which attempts to pull configuration from Hugging Face. Assuming `mlabonne/AlphaMonarch-7B-GGUF` might be unavailable or incorrectly named, and given your model file is local, your focus should be on the locally downloaded GGUF file for the model. However, your tokenizer needs to correspond correctly. If `mlabonne/AlphaMonarch-7B-GGUF` is not the right reference or doesn't exist, you would typically resort to a tokenizer that is compatible with the model architecture you are using, possibly a default or a generic one from a similar model type if the specific one is not available. # For Local Model Loading and Tokenizer: Given the direct model loading and the potential unavailability or misnaming confusions, let’s emphasize loading from a local GGUF without accessing the Hugging Face Hub for the config: ```python from transformers import AutoTokenizer from intel_extension_for_transformers.transformers import AutoModelForCausalLM # Assuming you know the architecture or the base model type your GGUF is compatible with, # you could use a general tokenizer that matches this architectural type. # For example, if AlphaMonarch-7B-GGUF is similar to a known GPT-2 model, you could use: tokenizer_name = "gpt2" # This is illustrative; adjust based on actual model compatibility tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) # Adjusted script to cut straight to using local files without fetching configurations model_path = "C:/Users/bower/.ollama/models/blobs/alphamonarch-7b/alphamonarch-7b.Q6_K.gguf" # This would require knowing how to manually load the GGUF file, which may need specific handling # not directly covered by the standard from_pretrained method, especially if config loading is an issue. model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True) # from here on, adapt the tokenizer usage as per your needs, referencing local paths if necessary ``` Note: Using `local_files_only=True` in `from_pretrained()` suggests to the method to avoid attempting to fetch any remote files, which is useful when you are working entirely with local files. The recommendation to adjust the tokenizer depends heavily on knowing what model architecture `AlphaMonarch-7B-GGUF` aligns with. # Final Note: Given the specific troubles you're facing with model and tokenizer configurations, it might be worth consulting the specific documentation or community forums associated with the `intel_extension_for_transformers` and Hugging Face `transformers` to address issues related to direct GGUF file loading, especially in the absence of available corresponding tokenizer or model configurations on Hugging Face's hub.
question:ok, now, i got this: ""PS C:Usersboweraugmentoolkit> & C:/Users/bower/AppData/Local/Programs/Python/Python311/python.exe "c:/Users/bower/Videos/AI Agent/model test/modeltest.py" tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.25k/1.25k [00:00<?, ?B/s] tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 5.94MB/s] tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:01<00:00, 1.30MB/s] special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 624/624 [00:00<?, ?B/s] Traceback (most recent call last): File "c:UsersbowerVideosAI Agentmodel testmodeltest.py", line 16, in <module> prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagestransformerstokenization_utils_base.py", line 1745, in apply_chat_template rendered = compiled_template.render( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagesjinja2environment.py", line 1301, in render self.environment.handle_exception() File "C:UsersbowerAppDataLocalProgramsPythonPython311Libsite-packagesjinja2environment.py", line 936, in handle_exception raise rewrite_traceback_stack(source=source) File "<template>", line 2, in top-level template code jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' PS C:Usersboweraugmentoolkit> ""
answer:The error you're encountering, `'dict object' has no attribute 'content'`, implies there's an issue with how the `apply_chat_template` method is being used or how the `messages` are structured and accessed within that context. Specifically, the error occurs because Jinja2 (a template engine for Python, which seems to be used internally by `apply_chat_template`) expects a `content` attribute in the objects it's iterating over, but it can't find it. This could mean there's a structural mismatch between what `apply_chat_template` expects and the actual format of your `messages`. Given the context and the objective to use the `apply_chat_template` method correctly, let's revisit and ensure the approach aligns with expected standards: # Problem Diagnosis - Jinja2's error message suggests that when trying to access the `content` attribute of an item in `messages`, it failed. This often happens if the data structure passed to the template renderer doesn't meet the expected format. - The `apply_chat_template` function is likely expecting a list of dictionaries, each with a `content` key, which is how you've structured `messages`. The issue might be in how `apply_chat_template` iterates over or accesses these dictionaries. # Potential Solution One possible cause for this error could be linked to how `apply_chat_template` is designed to work with the tokenizers and the specific versions of libraries you're using. Before diving deeper, ensure that: - Your installed `transformers` library version supports the `apply_chat_template` functionality as expected for your use case. - The tokenizer (`tokenizer_name = "mlabonne/AlphaMonarch-7B-GGUF"`) does indeed support and correctly implement `apply_chat_template`. # Immediate Steps - **Verify Tokenizer Support:** Confirm that the `apply_chat_template` method is available and intended for use with your specific tokenizer. This feature might not be uniformly available or could behave differently across tokenizer implementations or versions. - **Simplify Debugging:** Temporarily simplify the `messages` to a single dictionary or change how you're passing `messages` to the method to pinpoint if the problem is with how data is structured or passed. # Simplified Debug Example Here’s a debugging step to isolate the issue: ```python # Simplified single message to test the functionality messages = [ {"role": "user", "content": "What is your favourite condiment?"} ] # Attempt to use apply_chat_template with simplified messages try: prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) except Exception as e: print(f"An error occurred: {e}") ``` If this simplified version still fails, it suggests the issue is with how `apply_chat_template` is implemented or interacts with your data. If it succeeds, there might be an issue with the original `messages` structure or a misunderstanding of how `messages` should be formatted. # Final Thoughts - **Check Documentation:** Make sure to consult the most up-to-date documentation for the `transformers` library and any relevant guides or changelogs regarding `apply_chat_template` to understand its requirements and expected data structure fully. - **Community and Support Channels:** If the problem persists, seeking help from the Hugging Face community or support channels, including GitHub issues for the `transformers` library, might provide more targeted assistance, especially if this involves nuances or recent changes in the library's functionality.
question:here is the doc to help us : ""Hugging Face's logo Hugging Face Search models, datasets, users... Models Datasets Spaces Posts Docs Pricing Transformers Transformers Search documentation Ctrl+K MAIN EN 123,069 GET STARTED 🤗 Transformers Quick tour Installation TUTORIALS Run inference with pipelines Write portable code with AutoClass Preprocess data Fine-tune a pretrained model Train with a script Set up distributed training with 🤗 Accelerate Load and train adapters with 🤗 PEFT Share your model Agents Generation with LLMs TASK GUIDES NATURAL LANGUAGE PROCESSING AUDIO COMPUTER VISION MULTIMODAL GENERATION PROMPTING DEVELOPER GUIDES Use fast tokenizers from 🤗 Tokenizers Run inference with multilingual models Use model-specific APIs Share a custom model Templates for chat models Trainer Run training on Amazon SageMaker Export to ONNX Export to TFLite Export to TorchScript Benchmarks Notebooks with examples Community resources Custom Tools and Prompts Troubleshoot Contribute new quantization method PERFORMANCE AND SCALABILITY Overview Quantization EFFICIENT TRAINING TECHNIQUES Methods and tools for efficient training on a single GPU Multiple GPUs and parallelism Fully Sharded Data Parallel DeepSpeed Efficient training on CPU Distributed CPU training Training on TPU with TensorFlow PyTorch training on Apple silicon Custom hardware for training Hyperparameter Search using Trainer API OPTIMIZING INFERENCE CPU inference GPU inference Instantiating a big model Debugging XLA Integration for TensorFlow Models Optimize inference using `torch.compile()` CONTRIBUTE How to contribute to 🤗 Transformers? How to add a model to 🤗 Transformers? How to convert a 🤗 Transformers model to TensorFlow? How to add a pipeline to 🤗 Transformers? Testing Checks on a Pull Request CONCEPTUAL GUIDES Philosophy Glossary What 🤗 Transformers can do How 🤗 Transformers solve tasks The Transformer model family Summary of the tokenizers Attention mechanisms Padding and truncation BERTology Perplexity of fixed-length models Pipelines for webserver inference Model training anatomy Getting the most out of LLMs API MAIN CLASSES Agents and Tools Auto Classes Backbones Callbacks Configuration Data Collator Keras callbacks Logging Models Text Generation ONNX Optimization Model outputs Pipelines Processors Quantization Tokenizer Trainer DeepSpeed Feature Extractor Image Processor MODELS TEXT MODELS VISION MODELS AUDIO MODELS VIDEO MODELS MULTIMODAL MODELS REINFORCEMENT LEARNING MODELS TIME SERIES MODELS GRAPH MODELS INTERNAL HELPERS Custom Layers and Utilities Utilities for pipelines Utilities for Tokenizers Utilities for Trainer Utilities for Generation Utilities for Image Processors Utilities for Audio processing General Utilities Utilities for Time Series You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.39.1). Templates for Chat Models Introduction An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string of text (as is the case with a standard language model), the model instead continues a conversation that consists of one or more messages, each of which includes a role, like “user” or “assistant”, as well as message text. Much like tokenization, different models expect very different input formats for chat. This is the reason we added chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, represented as lists of messages, into a single tokenizable string in the format that the model expects. Let’s make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default template, which mostly just adds whitespace between rounds of dialogue: Copied from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] tokenizer.apply_chat_template(chat, tokenize=False) " Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!</s>" Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting, that string will also be tokenized for us. To see a more complex template in action, though, let’s use the mistralai/Mistral-7B-Instruct-v0.1 model. Copied from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] tokenizer.apply_chat_template(chat, tokenize=False) "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]" Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not. How do I use chat templates? As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role and content keys, and then pass it to the apply_chat_template() method. Once you do that, you’ll get output that’s ready to go! When using chat templates as input for model generation, it’s also a good idea to use add_generation_prompt=True to add a generation prompt. Here’s an example of preparing input for model.generate(), using the Zephyr assistant model: Copied from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceH4/zephyr-7b-beta" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here messages = [ { "role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate", }, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, ] tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") print(tokenizer.decode(tokenized_chat[0])) This will yield a string in the input format that Zephyr expects. Copied <|system|> You are a friendly chatbot who always responds in the style of a pirate</s> <|user|> How many helicopters can a human eat in one sitting?</s> <|assistant|> Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user’s question: Copied outputs = model.generate(tokenized_chat, max_new_tokens=128) print(tokenizer.decode(outputs[0])) This will yield: Copied <|system|> You are a friendly chatbot who always responds in the style of a pirate</s> <|user|> How many helicopters can a human eat in one sitting?</s> <|assistant|> Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all. Arr, ‘twas easy after all! Is there an automated pipeline for chat? Yes, there is! Our text generation pipelines support chat inputs, which makes it easy to use chat models. In the past, we used to use a dedicated “ConversationalPipeline” class, but this has now been deprecated and its functionality has been merged into the TextGenerationPipeline. Let’s try the Zephyr example again, but this time using a pipeline: Copied from transformers import pipeline pipe = pipeline("text-generation", "HuggingFaceH4/zephyr-7b-beta") messages = [ { "role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate", }, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, ] print(pipe(messages, max_new_tokens=128)[0]['generated_text'][-1]) # Print the assistant's response Copied {'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."} The pipeline will take care of all the details of tokenization and calling apply_chat_template for you - once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages! What are “generation prompts”? You may have noticed that the apply_chat_template method has an add_generation_prompt argument. This argument tells the template to add tokens that indicate the start of a bot response. For example, consider the following chat: Copied messages = [ {"role": "user", "content": "Hi there!"}, {"role": "assistant", "content": "Nice to meet you!"}, {"role": "user", "content": "Can I ask a question?"} ] Here’s what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example: Copied tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) """<|im_start|>user Hi there!<|im_end|> <|im_start|>assistant Nice to meet you!<|im_end|> <|im_start|>user Can I ask a question?<|im_end|> """ And here’s what it looks like with a generation prompt: Copied tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) """<|im_start|>user Hi there!<|im_end|> <|im_start|>assistant Nice to meet you!<|im_end|> <|im_start|>user Can I ask a question?<|im_end|> <|im_start|>assistant """ Note that this time, we’ve added the tokens that indicate the start of a bot response. This ensures that when the model generates text it will write a bot response instead of doing something unexpected, like continuing the user’s message. Remember, chat models are still just language models - they’re trained to continue text, and chat is just a special kind of text to them! You need to guide them with appropriate control tokens, so they know what they’re supposed to be doing. Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don’t have any special tokens before bot responses. In these cases, the add_generation_prompt argument will have no effect. The exact effect that add_generation_prompt has will depend on the template being used. Can I use chat templates in training? Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you can simply continue like any other language model training task. When training, you should usually set add_generation_prompt=False, because the added tokens to prompt an assistant response will not be helpful during training. Let’s see an example: Copied from transformers import AutoTokenizer from datasets import Dataset tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta") chat1 = [ {"role": "user", "content": "Which is bigger, the moon or the sun?"}, {"role": "assistant", "content": "The sun."} ] chat2 = [ {"role": "user", "content": "Which is bigger, a virus or a bacterium?"}, {"role": "assistant", "content": "A bacterium."} ] dataset = Dataset.from_dict({"chat": [chat1, chat2]}) dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)}) print(dataset['formatted_chat'][0]) And we get: Copied <|user|> Which is bigger, the moon or the sun?</s> <|assistant|> The sun.</s> From here, just continue training like you would with a standard language modelling task, using the formatted_chat column. Advanced: How do chat templates work? The chat template for a model is stored on the tokenizer.chat_template attribute. If no chat template is set, the default template for that model class is used instead. Let’s take a look at the template for BlenderBot: Copied from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") tokenizer.default_chat_template "{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}" That’s kind of intimidating. Let’s add some newlines and indentation to make it more readable. Note that the first newline after each block as well as any preceding whitespace before a block are ignored by default, using the Jinja trim_blocks and lstrip_blocks flags. However, be cautious - although leading whitespace on each line is stripped, spaces between blocks on the same line are not. We strongly recommend checking that your template isn’t printing extra spaces where it shouldn’t be! Copied {% for message in messages %} {% if message['role'] == 'user' %} {{ ' ' }} {% endif %} {{ message['content'] }} {% if not loop.last %} {{ ' ' }} {% endif %} {% endfor %} {{ eos_token }} If you’ve never seen one of these before, this is a Jinja template. Jinja is a templating language that allows you to write simple code that generates text. In many ways, the code and syntax resembles Python. In pure Python, this template would look something like this: Copied for idx, message in enumerate(messages): if message['role'] == 'user': print(' ') print(message['content']) if not idx == len(messages) - 1: # Check for the last message in the conversation print(' ') print(eos_token) Effectively, the template does three things: For each message, if the message is a user message, add a blank space before it, otherwise print nothing. Add the message content If the message is not the last message, add two spaces after it. After the final message, print the EOS token. This is a pretty simple template - it doesn’t add any control tokens, and it doesn’t support “system” messages, which are a common way to give the model directives about how it should behave in the subsequent conversation. But Jinja gives you a lot of flexibility to do those things! Let’s see a Jinja template that can format inputs similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system messages and slightly different system message handling in general - don’t use this one in your actual code!) Copied {% for message in messages %} {% if message['role'] == 'user' %} {{ bos_token + '[INST] ' + message['content'] + ' [/INST]' }} {% elif message['role'] == 'system' %} {{ '<<SYS>>n' + message['content'] + 'n<</SYS>>nn' }} {% elif message['role'] == 'assistant' %} {{ ' ' + message['content'] + ' ' + eos_token }} {% endif %} {% endfor %} Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based on the “role” of each message, which represents who sent it. User, assistant and system messages are clearly distinguishable to the model because of the tokens they’re wrapped in. Advanced: Adding and editing chat templates How do I create a chat template? Simple, just write a jinja template and set tokenizer.chat_template. You may find it easier to start with an existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template above and add ”[ASST]” and ”[/ASST]” to assistant messages: Copied {% for message in messages %} {% if message['role'] == 'user' %} {{ bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} {% elif message['role'] == 'system' %} {{ '<<SYS>>n' + message['content'].strip() + 'n<</SYS>>nn' }} {% elif message['role'] == 'assistant' %} {{ '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} {% endif %} {% endfor %} Now, simply set the tokenizer.chat_template attribute. Next time you use apply_chat_template(), it will use your new template! This attribute will be saved in the tokenizer_config.json file, so you can use push_to_hub() to upload your new template to the Hub and make sure everyone’s using the right template for your model! Copied template = tokenizer.chat_template template = template.replace("SYS", "SYSTEM") # Change the system token tokenizer.chat_template = template # Set the new template tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! The method apply_chat_template() which uses your chat template is called by the TextGenerationPipeline class, so once you set the correct chat template, your model will automatically become compatible with TextGenerationPipeline. If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat control tokens as special tokens in the tokenizer. Special tokens are never split, ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You should also set the tokenizer's `eos_token` attribute to the token that marks the end of assistant generations in your template. This will ensure that text generation tools can correctly figure out when to stop generating text. What are “default” templates? Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a model does not have a chat template set, but there is a default template for its model class, the TextGenerationPipeline class and methods like apply_chat_template will use the class template instead. You can find out what the default template for your tokenizer is by checking the tokenizer.default_chat_template attribute. This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when the class template is appropriate for your model, we strongly recommend overriding the default template by setting the chat_template attribute explicitly to make it clear to users that your model has been correctly configured for chat, and to future-proof in case the default templates are ever altered or deprecated. What template should I use? When setting the template for a model that’s already been trained for chat, you should ensure that the template exactly matches the message formatting that the model saw during training, or else you will probably experience performance degradation. This is true even if you’re training the model further - you will probably get the best performance if you keep the chat tokens constant. This is very analogous to tokenization - you generally get the best performance for inference or fine-tuning when you precisely match the tokenization used during training. If you’re training a model from scratch, or fine-tuning a base language model for chat, on the other hand, you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different input formats. Our default template for models that don’t have a class-specific template follows the ChatML format, and this is a good, flexible choice for many use-cases. It looks like this: Copied {% for message in messages %} {{'<|im_start|>' + message['role'] + 'n' + message['content'] + '<|im_end|>' + 'n'}} {% endfor %} If you like this one, here it is in one-liner form, ready to copy into your code. The one-liner also includes handy support for generation prompts, but note that it doesn’t add BOS or EOS tokens! If your model expects those, they won’t be added automatically by apply_chat_template - in other words, the text will be tokenized with add_special_tokens=False. This is to avoid potential conflicts between the template and the add_special_tokens logic. If your model expects special tokens, make sure to add them to the template! Copied tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + 'n' + message['content'] + '<|im_end|>' + 'n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistantn' }}{% endif %}" This template wraps each message in <|im_start|> and <|im_end|> tokens, and simply writes the role as a string, which allows for flexibility in the roles you train with. The output looks like this: Copied <|im_start|>system You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|> <|im_start|>user How are you?<|im_end|> <|im_start|>assistant I'm doing great!<|im_end|> The “user”, “system” and “assistant” roles are the standard for chat, and we recommend using them when it makes sense, particularly if you want your model to operate well with TextGenerationPipeline. However, you are not limited to these roles - templating is extremely flexible, and any string can be a role. I want to add some chat templates! How should I get started? If you have any chat models, you should set their tokenizer.chat_template attribute and test it using apply_chat_template(), then push the updated tokenizer to the Hub. This applies even if you’re not the model owner - if you’re using a model with an empty chat template, or one that’s still using the default class template, please open a pull request to the model repository so that this attribute can be set properly! Once the attribute is set, that’s it, you’re done! tokenizer.apply_chat_template will now work correctly for that model, which means it is also automatically supported in places like TextGenerationPipeline! By ensuring that models have this attribute, we can make sure that the whole community gets to use the full power of open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long - it’s time to put an end to them! Advanced: Template writing tips If you’re unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first write a short Python script that formats messages the way you want, and then convert that script into a template. Remember that the template handler will receive the conversation history as a variable called messages. Each message is a dictionary with two keys, role and content. You will be able to access messages in your template just like you can in Python, which means you can loop over it with {% for message in messages %} or access individual messages with, for example, {{ messages[0] }}. You can also use the following tips to convert your code to Jinja: For loops For loops in Jinja look like this: Copied {% for message in messages %} {{ message['content'] }} {% endfor %} Note that whatever’s inside the {{ expression block }} will be printed to the output. You can use operators like + to combine strings inside expression blocks. If statements If statements in Jinja look like this: Copied {% if message['role'] == 'user' %} {{ message['content'] }} {% endif %} Note how where Python uses whitespace to mark the beginnings and ends of for and if blocks, Jinja requires you to explicitly end them with {% endfor %} and {% endif %}. Special variables Inside your template, you will have access to the list of messages, but you can also access several other special variables. These include special tokens like bos_token and eos_token, as well as the add_generation_prompt variable that we discussed above. You can also use the loop variable to access information about the current loop iteration, for example using {% if loop.last %} to check if the current message is the last message in the conversation. Here’s an example that puts these ideas together to add a generation prompt at the end of the conversation if add_generation_prompt is True: Copied {% if loop.last and add_generation_prompt %} {{ bos_token + 'Assistant:n' }} {% endif %} Notes on whitespace As much as possible, we’ve tried to get Jinja to ignore whitespace outside of {{ expressions }}. However, be aware that Jinja is a general-purpose templating engine, and it may treat whitespace between blocks on the same line as significant and print it to the output. We strongly recommend checking that your template isn’t printing extra spaces where it shouldn’t be before you upload it! ""
answer:Based on the documentation you've provided, it seems that the `apply_chat_template()` method is integrated into the `AutoTokenizer` to help convert conversations, represented as lists of messages, into a single tokenizable string in the format that a model expects. This method is particularly useful for models designed to handle chat-based interactions, as it ensures the input is formatted correctly according to the model's training. The documentation provides examples illustrating how different models might require different chat input formats and how the `apply_chat_template()` can be used to accommodate these requirements. It allows for a more structured and model-aware preparation of chat inputs. However, the error you encountered (`jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'`) suggests that the way the `messages` variable was structured or utilized within the `apply_chat_template()` call didn't conform to what the template expected. Specifically, the Jinja2 template, which is used internally by the `apply_chat_template()` method, expects each item in the `messages` list to be a dictionary with a 'content' (and possibly 'role') key. The error implies there might have been a discrepancy in accessing these expected keys. # Addressing the Error Given that we've now understood how `apply_chat_template()` works and considering the error you've encountered: - Ensure every dictionary within the `messages` list has a 'content' key. This is crucial as the Jinja2 template expects this structure. - Based on the examples from the documentation, your use of `apply_chat_template()` seems correctly structured. Ensure there are no typos or issues with quotation marks around the 'content' and 'role' keys, as these could lead to parsing errors. Here's a reiteration of a correct structure: ```python from transformers import AutoTokenizer tokenizer_name = "mlabonne/AlphaMonarch-7B-GGUF" tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) # Correctly structured messages messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I’m quite partial to a good squeeze of fresh lemon juice."}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] # Applying chat template prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) ``` Given this approach aligns with the examples and explanations provided in the documentation, it should work assuming: 1. The `mlabonne/AlphaMonarch-7B-GGUF` exists and is correctly named on the Hugging Face Hub. 2. The tokenizer for `mlabonne/AlphaMonarch-7B-GGUF` supports the `apply_chat_template()` method and the model was trained in a manner that aligns with such a chat format.
question:the author recommends using This model with the Mistral Instruct chat template. Their page is: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 it reads: ""Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. text = "<s>[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> " "[INST] Do you have mayonnaise recipes? [/INST]" This format is available as a chat template via the apply_chat_template() method: from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0])""
answer:Given the details provided about the Mistral Instruct chat template from the MistralAI's model page on Hugging Face, and considering you're focusing on leveraging a local model with this instruction format, let's incorporate this information into a coherent setup. We'll adjust for the fact that you're working with an Intel CPU environment (hence, not using CUDA as the device). # Integrating Mistral Instruct Chat Template The specific use of `[INST]` and `[/INST]` tokens to surround instructions, as outlined in the MistralAI's documentation, explicitly guides the model on how to interpret the provided prompts and structure its responses accordingly. Given you're running this on an Intel CPU and might be using a local model (`alphaMonarch-7B` in GGUF format), the approach involves a small adjustment from the CUDA-based example provided by MistralAI. Primarily, changing the device to CPU usage and ensuring your local model path is accurately used. Here's how to adapt the MistralAI example for your setup: ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Ensure you've installed the intel_extension_for_pytorch if aiming for optimization on Intel CPUs. # Adjusting for Intel CPU usage device = "cpu" # Specifies that model inference will run on a CPU. # Adjust these paths and model names for your local setup and intended tokenizer. model_path = "C:/Users/bower/.ollama/models/blobs/alphamonarch-7b/alphamonarch-7b.Q6_K.gguf" tokenizer_name = "mistralai/Mistral-7B-Instruct-v0.1" # Load the tokenizer and model. The tokenizer is fetched from Hugging Face Hub, compatible with Mistral Instruct format. tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True).to(device) # Example messages, structured for conversation with instruction format in mind. messages = [ {"role": "user", "content": "[INST] What is your favourite condiment? [/INST]"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice."}, {"role": "user", "content": "[INST] Do you have mayonnaise recipes? [/INST]"} ] # Apply the chat template to format the input for the model as per the instruction format. encoded_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device) # Generate a response from the model given the structured and encoded input. generated_ids = model.generate(**encoded_inputs, max_new_tokens=1000, do_sample=True) # Decode the model output back into readable text. decoded = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) print(decoded[0]) ``` # Key Adjustments: - **Device Setup**: Changed to `device = "cpu"` to reflect usage in an Intel CPU environment. This means model and tokenized inputs are explicitly moved to CPU memory space. - **Local Model Loading**: Modified model loading to utilize `from_pretrained` with `local_files_only=True` pointing to `model_path` for running inference with your local GGUF model file. - **Conversation Formatting**: Integrated the `[INST]...[/INST]` instruction format directly within the `messages` content as per the recommendation from MistralAI. This ensures alignment with the guidance provided for leveraging instruction fine-tuning effectively. Ensure that you adjust `model_path` and confirm that the specified `tokenizer_name` is suitable for your use case and is compatible with applying chat templates as outlined in the MistralAI's example. Lastly, review if direct adaptation to using a GGUF file with `AutoModelForCausalLM.from_pretrained` is fully supported in your execution context; this operation's success can depend on the specific implementations of model loaders in Hugging Face's Transformers library and any additional configuration required for dealing with the GGUF format effectively.