Skip to content

Detection

The following functions are methods from the class ProfilesData.

Fog or Condensation

This method allows for the detection of fog or condensation in single profiles.

detect_foc(profiles, method='cloud_base', var='attenuated_backscatter_0', z_snr=2000.0, min_snr=2.0, zmin_cloud=200.0)

Detects fog or condensation. Two methods are available: - cloud_base (default): uses the minimum cloud base height. - snr (fallback if no clouds are available): identifies areas under a certain altitude for which snr values lower than the prescribed threshold.

Parameters:

Name Type Description Default
profiles ProfilesData

ProfilesData instance.

required
method {cloud_base, snr}

Detection method.

'cloud_base'
var str

Used for 'snr' method. Variable to calculate SNR from.

'attenuated_backscatter_0'
z_snr float

Used for 'snr' method. Altitude AGL (in m) at which we calculate the SNR.

2000.0
min_snr float

Used for 'snr' method. Minimum SNR under which the profile is considered as containing fog or condensation.

2.0
zmin_cloud float

Used for 'cloud_base' method. Altitude AGL (in m) below which a cloud base height is considered a fog or condensation situation.

200.0

Returns:

Type Description
ProfilesData

adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData):

  • 'foc' (time): mask array corresponding to the presence of foc.
Example
import aprofiles as apro
# read example file
path = "examples/data/L2_0-20000-001492_A20210909.nc"
reader = apro.reader.ReadProfiles(path)
profiles = reader.read()
# foc detection
profiles.foc()
# attenuated backscatter image with pbl up to 6km of altitude
profiles.plot(show_foc=True, zmax=6000., vmin=1e-2, vmax=1e1, log=True)

Fog or condensation (foc) detection

Source code in aprofiles/detection/foc.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def detect_foc(profiles, method: Literal["cloud_base", "snr"]="cloud_base", var="attenuated_backscatter_0", z_snr=2000., min_snr=2., zmin_cloud=200.):
    """Detects fog or condensation. Two methods are available:
    - cloud_base (default): uses the minimum cloud base height.
    - snr (fallback if no clouds are available): identifies areas under a certain altitude for which snr values lower than the prescribed threshold.

    Args:
        profiles (aprofiles.profiles.ProfilesData): `ProfilesData` instance.
        method ({'cloud_base', 'snr'}, optional): Detection method.
        var (str, optional): Used for 'snr' method. Variable to calculate SNR from.
        z_snr (float, optional): Used for 'snr' method. Altitude AGL (in m) at which we calculate the SNR.
        min_snr (float, optional): Used for 'snr' method. Minimum SNR under which the profile is considered as containing fog or condensation.
        zmin_cloud (float, optional): Used for 'cloud_base' method. Altitude AGL (in m) below which a cloud base height is considered a fog or condensation situation.

    Returns:
        (aprofiles.profiles.ProfilesData): 
            adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData):

            - `'foc' (time)`: mask array corresponding to the presence of foc.

    Example:
        ```python
        import aprofiles as apro
        # read example file
        path = "examples/data/L2_0-20000-001492_A20210909.nc"
        reader = apro.reader.ReadProfiles(path)
        profiles = reader.read()
        # foc detection
        profiles.foc()
        # attenuated backscatter image with pbl up to 6km of altitude
        profiles.plot(show_foc=True, zmax=6000., vmin=1e-2, vmax=1e1, log=True)
        ```

        ![Fog or condensation (foc) detection](../../assets/images/foc.png)
    """

    if method == "cloud_base" and "clouds" in profiles.data:
        foc = _detect_fog_from_cloud(profiles, zmin_cloud)
    else:
        foc = _detect_fog_from_snr(profiles, z_snr, var, min_snr)

    # creates dataarray
    profiles.data["foc"] = ("time", foc)
    profiles.data["foc"] = profiles.data.foc.assign_attrs({'long_name': 'Fog or condensation mask'})

    return profiles

Clouds

This method allows for the detection of clouds in attenuated backscatter profiles.

detect_clouds(profiles, method='dec', time_avg=1.0, zmin=0.0, thr_noise=5.0, thr_clouds=4.0, min_snr=0.0, verbose=False)

Module for clouds detection. Two methods are available: - "dec" (default): Deep Embedded Clustering (see AI-Profiles). - "vg": Vertical Gradient.

Parameters:

Name Type Description Default
profiles ProfilesData

ProfilesData instance.

required
method Literal['dec', 'vg']

cloud detection method used.

'dec'
time_avg float

in minutes, the time during which we aggregates the profiles prior to the clouds detection.

1.0
zmin float

altitude AGL, in m, above which we look for clouds. We recommend using the same value as used in the extrapolation_low_layers method (only for 'vg' method).

0.0
thr_noise float

threshold used in the test to determine if a couple (base,peak) is significant: data[peak(z)] data[base(z)] >= thr_noise * noise(z) (only for 'vg' method).

5.0
thr_clouds float

threshold used to discriminate aerosol from clouds: data[peak(z)] / data[base(z)] >= thr_clouds (only for 'vg' method).

4.0
min_snr float

Minimum SNR required at the clouds peak value to consider the cloud as valid (only for 'vg' method).

0.0
verbose bool

verbose mode. Defaults to False.

False

Returns:

Type Description
ProfilesData

adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData): - 'clouds' (time, altitude): Mask array corresponding to data flagged as a cloud.

Example 1
import aprofiles as apro
# read example file
path = "examples/data/L2_0-20000-001492_A20210909.nc"
reader = apro.reader.ReadProfiles(path)
profiles = reader.read()
# clouds detection
profiles.clouds(method="dec")
# attenuated backscatter image with clouds
profiles.plot(show_clouds=True, vmin=1e-2, vmax=1e1, log=True)

Clouds detection

Example 2
import aprofiles as apro
# read example file
path = "examples/data/L2_0-20000-001492_A20210909.nc"
reader = apro.reader.ReadProfiles(path)
profiles = reader.read()
# clouds detection
profiles.clouds(method="vg", zmin=300.)
# attenuated backscatter image with clouds
profiles.plot(show_clouds=True, vmin=1e-2, vmax=1e1, log=True)

Clouds detection

Source code in aprofiles/detection/clouds.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
def detect_clouds(profiles: aprofiles.profiles.ProfilesData, method: Literal["dec", "vg"]="dec", time_avg: float=1., zmin: float=0., thr_noise: float=5., thr_clouds: float=4., min_snr: float=0., verbose: bool=False):
    """Module for *clouds detection*.
    Two methods are available: 
    - "dec" (default): Deep Embedded Clustering (see [AI-Profiles](https://github.com/AugustinMortier/ai-profiles)).
    - "vg": Vertical Gradient.

    Args:
        profiles (aprofiles.profiles.ProfilesData): `ProfilesData` instance.
        method (Literal["dec", "vg"]): cloud detection method used.
        time_avg (float, optional): in minutes, the time during which we aggregates the profiles prior to the clouds detection.
        zmin (float, optional): altitude AGL, in m, above which we look for clouds. We recommend using the same value as used in the extrapolation_low_layers method (only for 'vg' method).
        thr_noise (float, optional): threshold used in the test to determine if a couple (base,peak) is significant: data[peak(z)] data[base(z)] >= thr_noise * noise(z) (only for 'vg' method).
        thr_clouds (float, optional): threshold used to discriminate aerosol from clouds: data[peak(z)] / data[base(z)] >= thr_clouds (only for 'vg' method).
        min_snr (float, optional): Minimum SNR required at the clouds peak value to consider the cloud as valid (only for 'vg' method).
        verbose (bool, optional): verbose mode. Defaults to `False`.

    Returns:
        (aprofiles.profiles.ProfilesData): 
            adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData):
            - `'clouds' (time, altitude)`: Mask array corresponding to data flagged as a cloud.

    Example 1:
        ```python
        import aprofiles as apro
        # read example file
        path = "examples/data/L2_0-20000-001492_A20210909.nc"
        reader = apro.reader.ReadProfiles(path)
        profiles = reader.read()
        # clouds detection
        profiles.clouds(method="dec")
        # attenuated backscatter image with clouds
        profiles.plot(show_clouds=True, vmin=1e-2, vmax=1e1, log=True)
        ```

        ![Clouds detection](../../assets/images/clouds_dec.png)

    Example 2:
        ```python
        import aprofiles as apro
        # read example file
        path = "examples/data/L2_0-20000-001492_A20210909.nc"
        reader = apro.reader.ReadProfiles(path)
        profiles = reader.read()
        # clouds detection
        profiles.clouds(method="vg", zmin=300.)
        # attenuated backscatter image with clouds
        profiles.plot(show_clouds=True, vmin=1e-2, vmax=1e1, log=True)
        ```

        ![Clouds detection](../../assets/images/clouds_vg.png)
    """

    def _vg_clouds(data, zmin, thr_noise, thr_clouds, min_snr):
        # data: 1D range corrected signal (rcs)
        # zmin: altitude AGL, in m, above which we detect clouds
        # thr_noise: threshold used in the test to determine if a couple (base,peak) is significant: data[peak(z)] - data[base(z)] >= thr_noise * noise(z)
        # thr_clouds: threshold used to discriminate aerosol from clouds: data[peak(z)] / data[base(z)] >= thr_clouds
        # min_snr: minimum SNR required at the clouds peak value to consider the cloud as valid. Defaults to 2.

        from scipy import signal
        from scipy.ndimage.filters import uniform_filter1d

        # some useful functions:
        def _get_all_indexes(bases, peaks, tops=[]):
            # get True indexes of bases, peaks and tops, based on masks
            i_bases = utils.get_true_indexes(bases)
            i_peaks = utils.get_true_indexes(peaks)
            i_tops = utils.get_true_indexes(tops)
            return i_bases, i_peaks, i_tops

        def _make_all_masks(data, i_bases, i_peaks, i_tops=[]):
            # return masks for bases, peaks and tops based on input indexes
            bases = utils.make_mask(len(data), i_bases)
            peaks = utils.make_mask(len(data), i_peaks)
            tops = utils.make_mask(len(data), i_tops)
            return bases, peaks, tops

        def _merge_layers(data, i_bases, i_peaks, i_tops):
            # merge layers depending on the altitude of the bases and the tops
            remove_mode = True
            while remove_mode:
                remove_bases, remove_peaks, remove_tops = [], [], []
                for i in range(len(i_bases) - 1):
                    # filters based on the index
                    if i_bases[i + 1] <= i_tops[i]:
                        remove_bases.append(i_bases[i + 1])
                        # remove the weakest peak
                        imin = np.argmin([data[i_peaks[i]], data[i_peaks[i + 1]]])
                        remove_peaks.append(i_peaks[i + imin])
                        # remove lowest top
                        imin = np.argmin([i_tops[i], i_tops[i + 1]])
                        remove_tops.append(i_tops[i + imin])
                        # start again from the bottom
                        break

                i_bases = [i for i in i_bases if i not in remove_bases]
                i_peaks = [i for i in i_peaks if i not in remove_peaks]
                i_tops = [i for i in i_tops if i not in remove_tops]
                if len(remove_bases) == 0:
                    remove_mode = False

            return i_bases, i_peaks, i_tops

        def _find_tops(data, i_bases, i_peaks):
            # function that finds the top of the layers by identifying the first value above thepeak for which the signal is lower than the base
            # if no top is found, the layer is removed
            # retruns lists of indexes of the bases, peaks and tops
            tops = np.asarray([False for x in np.ones(len(data))])
            # conditions: look for bases above i_peaks[i], and data[top[i]] <= data[base[i]]

            i_tops = []
            for i in range(len(i_bases)):
                mask_value = np.asarray(
                    [
                        True if data[j] < data[i_bases[i]] else False
                        for j in range(len(data))
                    ]
                )
                mask_altitude = np.asarray(
                    [True if j > i_peaks[i] else False for j in range(len(data))]
                )
                # the top is the first value that corresponds to the intersection of the two masks
                cross_mask = np.logical_and(mask_value, mask_altitude)
                i_cross_mask = utils.get_true_indexes(cross_mask)
                if len(i_cross_mask) > 0:
                    if tops[i_cross_mask[0]]:
                        bases[i_bases[i]] = False
                        peaks[i_peaks[i]] = False
                    else:
                        tops[i_cross_mask[0]] = True
                        i_tops.append(i_cross_mask[0])
                else:
                    bases[i_bases[i]] = False
                    peaks[i_peaks[i]] = False
            # it is important to keep the tops in the same order, so not to use utils.get_true_indexes() function here
            return utils.get_true_indexes(bases), utils.get_true_indexes(peaks), i_tops

        def _mask_cloud(profile: aprofiles.profiles, bases: list[int], tops: list[int]) -> np.ndarray:
            """Generate a boolean mask for cloud presence in a given profile.

            Args:
                profile (aprofiles.profiles): 1D array representing a single atmospheric profile.
                bases (list[int]): List of base indices where clouds start.
                tops (list[int]): List of top indices where clouds end.

            Returns:
                np.ndarray: Boolean mask of the same length as profile, where cloud regions are True.
            """
            mask = np.zeros_like(profile, dtype=bool)

            for base, top in zip(bases, tops):
                mask[base:top] = True

            return mask

        #-1. check if any valid data
        if np.isnan(data).all():
            bases, peaks, tops = _make_all_masks(data, [], [], [])
            return {
                "bases": bases,
                "peaks": peaks,
                "tops": tops,
            }

        # 0. rolling average
        avg_data = uniform_filter1d(data, size=10)

        # 1. first derivative
        gradient = np.gradient(avg_data)
        #remove zero values since np.sign(0) = 0
        gradient = [value if value != 0 else 1e-9 for value in gradient]

        # 2. identifies peaks and base by checking the sign changes of the derivative
        sign_changes = np.diff(np.sign(gradient), append=0)
        all_bases = sign_changes == 2
        all_peaks = sign_changes == -2
        # limit to bases above zmin
        imin = profiles._get_index_from_altitude_AGL(zmin)
        all_bases[0:imin] = np.full(imin, False)
        all_peaks[0:imin] = np.full(imin, False)
        # get indexes
        i_bases = utils.get_true_indexes(all_bases)
        i_peaks = utils.get_true_indexes(all_peaks)

        # 3. the signal should start with a base
        if (len(i_bases)>0): #this happens if the whole profile consists of nan
            if i_bases[0] > i_peaks[0] and i_peaks[0] >= 1:
                # set base as the minimum between peak and n gates under
                gates = np.arange(i_peaks[0] - 5, i_peaks[0])
                i_base = gates[np.argmin([data[gates[gates >= 0]]])]
                if i_base >= imin:
                    all_bases[i_base] = True
                else:
                    all_peaks[i_peaks[0]] = False
            # update indexes
            i_bases = utils.get_true_indexes(all_bases)
            i_peaks = utils.get_true_indexes(all_peaks)

        # 4. keeps significant couples (base,peak)
        # a layer can be considered as a proper layer if the difference of signal between the peak and the base is significant (larger than the noise level)
        # noise evaluation (using a high passing frequency filter)
        b, a = signal.butter(1, 0.3, btype="high")
        noise = signal.filtfilt(b, a, data)
        # rolling average of the noise
        avg_abs_noise = uniform_filter1d(abs(noise), size=100)
        # make sure we have as many peaks as bases
        if len(i_peaks) != len(i_bases):
            min_len = min([len(i_peaks), len(i_bases)])
            i_peaks = i_peaks[0:min_len]
            i_bases = i_bases[0:min_len]
        bases, peaks = all_bases, all_peaks
        for i in range(len(i_bases)):
            data_around_peak = avg_data[i_peaks[i]]
            data_around_base = avg_data[i_bases[i]]
            if (
                data_around_peak - data_around_base
                <= thr_noise * avg_abs_noise[i_bases[i]]
            ):
                bases[i_bases[i]] = False
                peaks[i_peaks[i]] = False
        # get indexes
        i_bases = utils.get_true_indexes(bases)
        i_peaks = utils.get_true_indexes(peaks)

        # 5. make sure we finish by a peak: remove last base if necessary
        if len(i_bases) > len(i_peaks):
            bases[i_bases[-1]] = False
            i_bases.pop()

        # 6. find tops of clouds layers
        i_bases, i_peaks, i_tops = _find_tops(avg_data, i_bases, i_peaks)

        # 7. merge layers: for a couple of base and peaks 1,2 if data(b2)>data(p1), then merge layers 1 and 2 by removing p1 and b2
        i_bases, i_peaks, i_tops = _merge_layers(avg_data, i_bases, i_peaks, i_tops)

        # 8. find peaks as maximum between base and top    
        i_peaks = [
            i_bases[i] + np.argmax(data[i_bases[i]: i_tops[i]])
            for i in range(len(i_bases))
        ]
        # reconstruct masks
        bases, peaks, tops = _make_all_masks(data, i_bases, i_peaks, i_tops)

        # 9. distinction between aerosol and clouds
        for i in range(len(i_bases)):
            data_around_peak = avg_data[i_peaks[i]]
            data_around_base = avg_data[i_bases[i]]
            if (
                abs((data_around_peak - data_around_base) / data_around_base)
                <= thr_clouds
            ):
                bases[i_bases[i]] = False
                peaks[i_peaks[i]] = False
                tops[i_tops[i]] = False
        # get indexes
        i_bases, i_peaks, i_tops = _get_all_indexes(bases, peaks, tops)

        """
        #10. set base a closest index
        for i, _ in enumerate(i_peaks):
            mask_value = np.asarray([True if gradient[j]<0 else False for j in range(len(data))])
            mask_altitude = np.asarray([True if j<i_peaks[i] else False for j in range(len(data))])
            #the top is the first value that corresponds to the intersection of the two masks
            cross_mask = np.logical_and(mask_value, mask_altitude)
            i_cross_mask = utils.get_true_indexes(cross_mask)
            i_bases[i] = i_cross_mask[-1]
        """

        # 11. check snr at peak levels
        remove_bases, remove_peaks, remove_tops = [], [], []
        for i in range(len(i_peaks)):
            if utils.snr_at_iz(data, i_peaks[i], step=10) < min_snr:
                remove_bases.append(i_bases[i])
                remove_peaks.append(i_peaks[i])
                remove_tops.append(i_tops[i])
        # remove indexes
        i_bases = [i for i in i_bases if i not in remove_bases]
        i_peaks = [i for i in i_peaks if i not in remove_peaks]
        i_tops = [i for i in i_tops if i not in remove_tops]

        # 11. build cloud mask from indexes
        clouds = _mask_cloud(data, i_bases, i_tops)

        """
        #some plotting
        fig, axs = plt.subplots(1,2,figsize=(10,10))

        ymin, ymax = 000, 15000
        altitude_agl = profiles.data.altitude.data - profiles.data.station_altitude.data

        #signal on the left
        axs[0].plot(data, altitude_agl, 'b', label='rcs')
        axs[0].plot(avg_data, altitude_agl, 'c', label='rcs')
        axs[0].plot(avg_abs_noise,altitude_agl,':b', label='noise level')
        axs[0].plot(avg_abs_noise*thr_noise,altitude_agl,':b', label=f'noise level * {thr_noise}')
        axs[0].plot(data[bases], altitude_agl[bases], '<g', label='bases')
        axs[0].plot(data[peaks], altitude_agl[peaks], '>r', label='peaks')
        axs[0].plot(data[tops], altitude_agl[tops], '^k', label='tops')

        #set axis
        axs[0].set_ylim([ymin, ymax])
        #axs[0].set_xlim([-20000,20000])
        axs[0].legend()

        #derivative on the right
        axs[1].plot(ddata_dz, altitude_agl, 'b', label='first derivative')
        axs[1].plot(ddata_dz[bases], altitude_agl[bases], '<g', label='bases')
        axs[1].plot(ddata_dz[peaks], altitude_agl[peaks], '>r', label='peaks')

        axs[1].set_ylim([ymin, ymax])
        axs[1].legend()
        #set title
        fig.suptitle(t,weight='bold')
        """

        return clouds


    def _prepare_data(data, vmin, vmax, target_size):
        # Log transform
        #data = np.log(data)

        # Clip data to the range [vmin, vmax] and then scale it
        data_clipped = np.clip(data, vmin, vmax)

        # Replace NaNs with minimum values
        np.nan_to_num(data_clipped, copy=False, nan=vmin)

        # Invert the normalized data to reflect the 'gray_r' colormap
        data_normalized = (data_clipped - vmin) / (vmax - vmin)
        data_inverted = 1 - data_normalized  # Invert the grayscale values

        # Resize the inverted data to the target size
        resized_data = resize(
            data_inverted,
            output_shape=target_size,
            order=0,  # Nearest-neighbor interpolation to avoid smoothing
            anti_aliasing=False
        )
        return resized_data


    def _ml_clouds(prepared_data, encoder, cluster, target_size, output_shape):
        # Add batch and channel dimensions for further processing
        data_array = np.expand_dims(np.expand_dims(prepared_data, axis=0), axis=-1)

        # Encode the image to get feature representation
        encoded_img = encoder.predict(data_array)[0]  # Remove batch dimension

        # Step 1: Aggregate Encoded Features
        aggregated_encoded_img = np.mean(encoded_img, axis=-1)  # Aggregated to single-channel (16, 32)

        # Optional: Normalize for better visualization
        aggregated_encoded_img = (aggregated_encoded_img - aggregated_encoded_img.min()) / (aggregated_encoded_img.max() - aggregated_encoded_img.min())

        # Step 2: Flatten Encoded Features and Cluster
        encoded_img_flat = encoded_img.reshape(-1, encoded_img.shape[-1])  # Flatten spatial dimensions for clustering
        pixel_labels = cluster.predict(encoded_img_flat)  # Get cluster labels for each pixel

        # Step 3: Map clusters to categories
        category_mapping = {
            1: False,  # molecules
            3: False,  # molecules
            5: False,  # noise
            4: False,  # aerosols
            2: True,  # clouds
            6: True,  # clouds
            0: False,  # other
            7: False   # other
        }
        pixel_labels = np.vectorize(category_mapping.get)(pixel_labels)

        # Reshape the cluster labels back to the spatial dimensions
        pixel_labels_image_shape = pixel_labels.reshape(encoded_img.shape[0], encoded_img.shape[1])

        # Step 4: Upsample the cluster labels to match the original image size
        upsampled_pixel_labels = resize(
            pixel_labels_image_shape,
            (target_size[0], target_size[1]),
            order=0,  # Nearest-neighbor interpolation
            preserve_range=True,
            anti_aliasing=False
        )
        #upsampled_pixel_labels = upsampled_pixel_labels.astype(int)  # Ensure the labels are integers

        # resize upsampled_pixel_labels to original size
        resized_upsampled_pixel_labels = resize(
                upsampled_pixel_labels,
                output_shape=output_shape,
                order=0,  # Nearest-neighbor interpolation to avoid smoothing
                anti_aliasing=False
            )

        return resized_upsampled_pixel_labels

    def _split_matrix(matrix, max_size):
        return [matrix[i:i+max_size] for i in range(0, matrix.shape[0], max_size)]

    def _combine_matrices(matrices):
        return np.vstack(matrices)


    # we work on profiles averaged in time to reduce the noise
    rcs = profiles.time_avg(
        time_avg, var="attenuated_backscatter_0"
    ).data.attenuated_backscatter_0

    # imports and check options
    if method == 'dec':
        import os
        import warnings
        import absl.logging
        from sklearn.exceptions import InconsistentVersionWarning

        # Suppress TensorFlow CUDA messages
        os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
        os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
        os.environ['XLA_FLAGS'] = '--xla_gpu_cuda_data_dir='  # Prevents unnecessary CUDA checks
        os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "false"  # Avoids some GPU-related logs

        # Suppress Abseil warnings
        absl.logging.set_verbosity(absl.logging.ERROR)

        # Suppress scikit-learn version warnings
        warnings.simplefilter("ignore", InconsistentVersionWarning)

        # Suppress TensorFlow progress bar
        import tensorflow as tf
        tf.keras.utils.disable_interactive_logging()

        import numpy as np
        from tensorflow.keras.models import load_model
        from skimage.transform import resize
        from pathlib import Path
        import joblib

        # ML parameters
        ML = {
            'paths': {
                'encoder': Path(Path(__file__).parent,'ml_models','encoder.keras'),
                'kmeans': Path(Path(__file__).parent,'ml_models','kmeans.pkl')    
            },
            'params': {
                'vmin': -2,
                'vmax': 2,
                'target_size': (256, 512)    
            }
        }

        # we split the rcs into subsets, so we artificially increase the time resolution of the cloud detection which is degraded by the DEC technique.
        split_rcs_list = _split_matrix(rcs, 100)

        # Load the encoder and kmeans model
        encoder = load_model(ML['paths']['encoder'])
        cluster = joblib.load(ML['paths']['kmeans'])

        # prepare data
        split_ml_clouds = []
        for split_rcs in split_rcs_list:
            prepared_data = _prepare_data(split_rcs, ML['params']['vmin'], ML['params']['vmax'], ML['params']['target_size'])
            split_ml_clouds.append(_ml_clouds(prepared_data, encoder, cluster, ML['params']['target_size'], output_shape=np.shape(split_rcs)))

        # aggregate split_ml_clouds
        clouds = _combine_matrices(split_ml_clouds)

        options = {
            'encoder': ML['paths']['encoder'],
            'kmeans': ML['paths']['kmeans']
        }

    elif method == 'vg':
        import numpy as np
        if None in (zmin, thr_noise, thr_clouds, min_snr):
            raise ValueError("For method 'vg', zmin, thr_noise, thr_clouds, and min_snr must be provided.")

        clouds = []
        for i in (track(range(len(profiles.data.time.data)), description="clouds",disable=not verbose)):
            i_clouds = _vg_clouds(
                rcs.data[i, :], zmin, thr_noise, thr_clouds, min_snr
            )

            # store info in 2D array
            clouds.append(i_clouds)

        options = {
            'zmin': zmin,
            'thr_noise': thr_noise,
            'thr_clouds': thr_clouds,
            'min_snr': min_snr
        }

    # creates dataarrays
    profiles.data["clouds"] = (("time", "altitude"), clouds)
    profiles.data["clouds"] = profiles.data.clouds.assign_attrs({
        'long_name': 'Clouds mask',
        'units': 'bool',
        'time_avg': time_avg,
        'method': method,
        'options': str(options)
    })
    return profiles

Planetary Boundary Layer

This method allows for the tracking of the Planetary Boundary Layer (PBL) height in single profiles.

detect_pbl(profiles, time_avg=1, zmin=100.0, zmax=3000.0, wav_width=200.0, under_clouds=True, min_snr=1.0, verbose=False)

Module for Planetary Boundary Layer Height detection. Detects Planetary Boundary Layer Height between zmin and zmax by looking at the maximum vertical gradient in each individual profiles.

Parameters:

Name Type Description Default
profiles ProfilesData

ProfilesData instance.

required
time_avg int

in minutes, the time during which we aggregate the profiles before detecting the PBL.

1
zmin float

maximum altitude AGL, in m, for retrieving the PBL.

100.0
zmin float

minimum altitude AGL, in m, for retrieving the PBL.

100.0
wav_width float

Width of the wavelet used in the convolution, in m.

200.0
under_clouds bool

If True, and if clouds detection have been called before, force the PBL to be found below the first cloud detected in the profile.

True
min_snr float

Minimum SNR at the retrieved PBL height required to return a valid value.

1.0
verbose bool

verbose mode.

False

Returns:

Type Description
ProfilesData

adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData):

  • 'pbl' (time, altitude): mask array corresponding to the pbl height.
Example
import aprofiles as apro
# read example file
path = "examples/data/L2_0-20000-001492_A20210909.nc"
reader = apro.reader.ReadProfiles(path)
profiles = reader.read()
# pbl detection
profiles.pbl(zmin=100., zmax=3000.)
# attenuated backscatter image with pbl up to 6km of altitude
profiles.plot(show_pbl=True, zmax=6000., vmin=1e-2, vmax=1e1, log=True)

Planetary Boundary Layer Height detection

Source code in aprofiles/detection/pbl.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def detect_pbl(
    profiles,
    time_avg=1,
    zmin=100.0,
    zmax=3000.0,
    wav_width=200.0,
    under_clouds=True,
    min_snr=1.0,
    verbose=False,
):
    """Module for *Planetary Boundary Layer Height* detection.
    Detects Planetary Boundary Layer Height between zmin and zmax by looking at the maximum vertical gradient in each individual profiles.

    Args:
        profiles (aprofiles.profiles.ProfilesData): `ProfilesData` instance.
        time_avg (int, optional): in minutes, the time during which we aggregate the profiles before detecting the PBL.
        zmin (float, optional): maximum altitude AGL, in m, for retrieving the PBL.
        zmin (float, optional): minimum altitude AGL, in m, for retrieving the PBL.
        wav_width (float, optional): Width of the wavelet used in the convolution, in m.
        under_clouds (bool, optional): If True, and if clouds detection have been called before, force the PBL to be found below the first cloud detected in the profile.
        min_snr (float, optional): Minimum SNR at the retrieved PBL height required to return a valid value.
        verbose (bool, optional): verbose mode.

    Returns:
        (aprofiles.profiles.ProfilesData): 
            adds the following (xarray.DataArray) to existing (aprofiles.profiles.ProfilesData):

            - `'pbl' (time, altitude)`: mask array corresponding to the pbl height.

    Example:
        ```python
        import aprofiles as apro
        # read example file
        path = "examples/data/L2_0-20000-001492_A20210909.nc"
        reader = apro.reader.ReadProfiles(path)
        profiles = reader.read()
        # pbl detection
        profiles.pbl(zmin=100., zmax=3000.)
        # attenuated backscatter image with pbl up to 6km of altitude
        profiles.plot(show_pbl=True, zmax=6000., vmin=1e-2, vmax=1e1, log=True)
        ```

        ![Planetary Boundary Layer Height detection](../../assets/images/pbl.png)
    """

    from scipy.ndimage.filters import uniform_filter1d

    def _detect_pbl_from_rcs(data, zmin, zmax, wav_width, min_snr):
        # detect pbl from range corrected signal between zmin and zmax using convolution with a wavelet..
        """
        from scipy import signal

        #define wavelet with constant width
        npoints = len(data)
        width = wav_width #in meter
        wav = signal.ricker(npoints, width/profiles._get_resolution('altitude'))

        #simple convolution
        convolution = signal.convolve(data, wav, mode='same')

        #the PBL is the maximum of the convolution
        #sets to nan outside of PBL search range
        convolution[0:profiles._get_index_from_altitude_AGL(zmin):] = np.nan
        convolution[profiles._get_index_from_altitude_AGL(zmax):] = np.nan
        i_pbl = np.nanargmax(abs(convolution))
        """
        #-1. check if any valid data
        if np.isnan(data).all():
            return np.nan

        # 0. rolling average
        avg_data = uniform_filter1d(data, size=10)

        # simple gradient
        gradient = np.gradient(avg_data)

        # the PBL is the maximum of the convolution
        # sets to nan outside of PBL search range
        gradient[0: profiles._get_index_from_altitude_AGL(zmin):] = np.nan
        gradient[profiles._get_index_from_altitude_AGL(zmax):] = np.nan
        if not np.isnan(gradient).all():
            i_pbl = np.nanargmin(gradient)
        else:
            return np.nan

        # calculates SNR
        snr = utils.snr_at_iz(data, i_pbl, step=10)
        if snr > min_snr:
            return profiles.data.altitude.data[i_pbl]
        else:
            return np.nan

    # we work on profiles averaged in time to reduce the noise
    rcs = profiles.time_avg(
        time_avg, var="attenuated_backscatter_0", inplace=False
    ).data.attenuated_backscatter_0

    # if under_clouds, check if clouds_bases is available
    if under_clouds and "clouds" in list(profiles.data.data_vars):
        lowest_clouds = profiles._get_lowest_clouds()
    elif under_clouds and "clouds" not in list(profiles.data.data_vars):
        import warnings

        warnings.warn(
            "under_clouds parameter sets to True (defaults value) when the clouds detection has not been applied to ProfileData object."
        )
        lowest_clouds = [np.nan for _ in np.arange(len(profiles.data.time))]
    else:
        lowest_clouds = [np.nan for _ in np.arange(len(profiles.data.time))]

    pbl = []
    for i in (track(range(len(profiles.data.time.data)), description="pbl   ", disable=not verbose)):
        lowest_cloud_agl = lowest_clouds[i] - profiles.data.station_altitude.data
        _zmax = np.nanmin([zmax, lowest_cloud_agl])
        pbl.append(
            _detect_pbl_from_rcs(
                rcs.data[i, :],
                zmin,
                _zmax,
                wav_width,
                min_snr,
            )
        )

    # creates dataarrays
    profiles.data["pbl"] = ("time", pbl)
    profiles.data["pbl"] = profiles.data.pbl.assign_attrs({
        'long_name': "Planetary Boundary Layer Height, ASL",
        'units': 'm',
        'time_avg': time_avg,
        'zmin': zmin,
        'zmax': zmax
        })
    return profiles