Integrating DALL-E to Generate Images in Your SwiftUI iPhone App

In the last tutorial, we looked at ChatGPT chat conversations and added a simple user interface to a SwiftUI app. Today we will look at integrating DALL-E into your SwiftUI app, using the image generations endpoint that allows you to provide a prompt and generate one to ten images. The app will be similar to what was created yesterday. The difference is that instead of receiving an answer in text form, we’ll receive it as a URL that can then be processed to display on the view.

To begin, we need to create a new iPhone app in Xcode. Select the iOS App template, give it a name, select SwiftUI and Swift, and save your project somewhere.

You must also create an account at openai.com if you still need to. You then need to go to the API section and add a card, and put some credit on your account. I suggest limiting monthly spending to $10. Images take up more credit than the text-based answers from yesterday, but testing a few images won’t break the bank. When writing this, the smallest images are 1.6 cents each, so generating 10 of them will be 16 cents.

When you have created your account, you’ll need to head over to the API, go to settings, and create an API key that you can use. Remember not to share this with anybody, although with a hard limit of $10 set (make sure you do this).

Creating the View

Lets prepare the properties that we need to make this work:

@State private var prompt: String = ""
@State private var apiKey: String = ""
@State private var errorText: String? = ""
@State private var results: [String] = []
@State private var cancellable: AnyCancellable? = nil
@State private var selectedImageSize = ImageSizeOption.small
@State private var selectedNumberOfImagesValue = 1

A quick explanation of what each of these does:

The prompt property is used for keeping track of the image prompt you provide (the text you describe what image you want the model to generate)
The apiKey is a place for you to paste your OpenAI api key
The errorText will contain an error if the API runs into a problem (typically that your API key is wrong, or that you’re out of credit, for example
results is an array of strings that contains all of the image URLs that link to the images you generated
cancellable is related to the http request (more explained later)
selecctedImageSize is an enum that contains the three valid sizes of 256×256, 512×512, 1024×1024, but named as small, medium, and large. .small is the default in this example
selectedNumberOfImages defaults to 1, but the API allows up to 10 with the number being selectable on the view

With these in place, lets take a look at the view:

var body: some View {
    VStack {
        TextField("API Key", text: $apiKey)
            .textFieldStyle(.roundedBorder)
            .padding()
        TextField("Enter Image Prompt", text: $prompt)
            .textFieldStyle(.roundedBorder)
            .padding()
        Picker(selection: $selectedImageSize, label: Text("Image Size")) {
            ForEach(ImageSizeOption.allCases, id: \.self) { sizeOption in
                Text(sizeOption.rawValue).tag(sizeOption)
            }
        }
        .pickerStyle(SegmentedPickerStyle())
        Picker(selection: $selectedNumberOfImagesValue, label: Text("Picker")) {
            ForEach(1...10, id: \.self) { value in
                Text("\(value)").tag(value)
            }
        }
        .pickerStyle(DefaultPickerStyle())
        Button(action: {
            submit()
        }) {
            Image(systemName: "paperplane")
                .font(.title)
        }
        .padding()
        
        Spacer()
        ScrollView {
            let columnsCount = selectedNumberOfImagesValue < 3 ? selectedNumberOfImagesValue : 3
            let columns: [GridItem] = Array(repeating: .init(.flexible()), count: columnsCount)
            LazyVGrid(columns: columns) {
                ForEach(results, id: \.self) { imageUrlString in
                    if let imageUrl = URL(string: imageUrlString) {
                        AsyncImage(url: imageUrl, scale: 1.0) { image in
                            image
                                .resizable()
                                .aspectRatio(contentMode: .fit)
                        } placeholder: {
                            ProgressView()
                        }
                    }
                }
            }
        }
    }
}

We start by nesting everything within a VStack making the whole UI just stack each item above each other.

As mentioned earlier, the first TextField is where you enter your API Key, the one provided by OpenAI.

The next TextField is where you enter your prompt to describe what image you want to see.

I chose a picker for the next item in the view. This picker uses segments to allow you to select the resolution you want the images to be returned in. Given that these images are thrown away and not saved in this tutorial, choose the smallest resolution as they are cheaper. The data for the segments are populated by an enum further down in the code.

The next item is another picker view, but this time a default picker style (a scrolling wheel, like the calendar uses for date selection). This is used to choose how many images you want to have generated. I suggest just one as they are just being thrown away after use, but feel free to go for the full 10 if you wish to see it working.

We have a button next that lets you submit your prompt with the options you selected. The action on this button calls the submit() function declared later in the code.

We then have a spacer to push the above code to the top of the screen and the images to the bottom.

Next is ScrollView, which wraps around the image displaying code so that if you have more than one image, it allows you to scroll and see them.

Lets work through the next parts of code as this code is less frequently used compared to standard TextFields and buttons.

First, we get the number of columns that we want. If we have three or more images to show, then we want to make three columns. If there are 1 or 2 images, we want just 1 or 2 columns. This code uses a ternary to check if the number of images is less than 3; if so, it returns that number. If it’s three or more, then it defaults to 3 columns.

In the next line, we create an array of GridItem. GridItem is part of SwiftUI to define a flexible or fixed-size column layout in a grid. It takes care of the layout and lets us specify how many columns we want, the number calculated just above.

We then use LazyVGrid to wrap around the view where we show the images. The lines above prepare what we need to set the columns parameter of LazyVGrid.

In the next section of code, we have a ForEach that loops around results, results being an array of strings defined with @State at the top. This will contain string values of all of the images we asked for. ForEach allows us to iterate through each item in the array of strings. The syntax means that imageUrlString will be each URL on each iteration of the loop.

Next, we need to convert the string into a URL, and if that works, we use AsyncImage, which accepts a URL along with some settings.

Notice that we have image in image here. This is a closure, and at a high level, it means that AsyncImage starts fetching the image. When it gets the image, the closure is called, which puts the image on the view set with the resizable and aspectratio modifiers.

This concludes the view.

Preparing Structs to Request and Receive Images

struct APIResponse: Decodable {
    let created: Date
    let data: [Data]
}

struct Data: Decodable {
    let url: String?
    let b64_json: String?
}

struct APIRequest: Encodable {
    let prompt: String
    let n: Int
    let size: ImageSizeOption
}

enum ImageSizeOption: String, CaseIterable, Identifiable, Codable {
    case small = "256x256"
    case medium = "512x512"
    case large = "1024x1024"
    
    var id: ImageSizeOption { self }
}

Let’s begin with the APIRequest. This is the struct that we use to request an image for the API. It contains the prompt that is a string, n, which is the number of images (1 – 10) and size, which is an enum called ImageSizeOption. Here we specify three image sizes that are compatible with DALL-E.

APIRequest is Encodable, meaning we can encode it into JSON data needed in the HTTP request.

Next, we have the APIResponse, which conforms to Decodable, meaning that it can be converted from the JSON data that comes back from the API and decoded into a structure we can use. APIResponse contains a date and an array of Data.

Data contains two strings, both optional. This particular app version only works with a URL string, but this struct has the option of b64_json to be used. To make that work, you would need to add response_fomat as a string to APIRequest and specify “b64_json” as its value, although the image viewer won’t work as it is and would need some modification. The difference between the URL and b64 data is that with the data, you receive the whole image to convert and put on the view, compared to URL where you need to fetch the result. Please modify the code to see if you can get it working with base 64 data.

The reason that Data is an array is that you may have requested more than 1 image. An array allows you to receive many URLs back to process.

func submit() {
    guard let url = URL(string: "https://api.openai.com/v1/images/generations") else {
        print("Invalid URL")
        return
    }
    
    var request = URLRequest(url: url)
    request.httpMethod = "POST"
    request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
    request.setValue("application/json", forHTTPHeaderField: "Content-Type")
    
    do {
        let payload = APIRequest(prompt: prompt,
                                 n: selectedNumberOfImagesValue,
                                 size: selectedImageSize)
        let jsonData = try JSONEncoder().encode(payload)
        request.httpBody = jsonData
    } catch {
        print("Error: \(error)")
        return
    }
    
    cancellable = URLSession.shared.dataTaskPublisher(for: request)
        .tryMap { $0.data }
        .decode(type: APIResponse.self, decoder: JSONDecoder())
        .receive(on: DispatchQueue.main)
        .sink(
            receiveCompletion: { completion in
                switch completion {
                case .failure(let error):
                    errorText = "Error: \(error.localizedDescription)"
                case .finished:
                    break
                }
            },
            receiveValue: { response in
                results = response.data.compactMap { $0.url }
                prompt = ""
            }
        )
}

This code will look familiar if you read the last tutorial about making a chat request. I’ll still explain it to help those who have not seen this.

The first job is to create a URL from a string. We guard this, but since it compiles and works, guarding isn’t needed as we are not adding anything dynamic to the URL, but it is good practice.

When we have the URL, we create a URLRequest with that URL and follow up with some settings such as setting the bearer (your API key you paste into the running app) and setting the response as application/json.

Now that the request headers are ready, we need to look at creating the body of the POST request.

The first task is to create the payload, which is the APIRequest with the parameters passed in that are: prompt, number of images, and image size.

The payload is then encoded to JSON, and with that, we set the request body to that payload.

If all is OK, we can start the URLSession. We store that session in cancellable, an @State property wrapper bound to the view. This means that as long as the view is on screen, the URLSession will remain active but die if you move away from the view.

Next is .tryMap, which gets the first item in a tuple returned from the URLRequest. More will follow on this in another tutorial.

We now decode the response and below that, we instruct it to use the main queue.

We then run .sink, which starts the execution of the URLSession and waits for the response.

In the first, we have .failure, which is called when something goes wrong. We have .finished, which is called when it completes, and then receiveValue is used for when data has been received. This is where we set the results property, which then triggers the view to reload.

Ready to Go

With this all in place, you are ready to run and test the app. Paste in your API key, enter a prompt, select resolution, choose how many images, and submit the request. Please remember that none of the images in this app are saved, so I’d recommend limiting your searches to just 1 or 2 low-res images at a time.

Note that errorText is not used, although it is set. Feel free to modify the code to add that to the view.

The project can be downloaded from here.

Creating the View

Preparing Structs to Request and Receive Images

Ready to Go

Leave a Reply Cancel reply