00:00:02.220 --> 00:00:13.170
clyne: Providing functionality and there they don't have a lot of cycles, for you know optimization efforts, so you know we've kind of thought, maybe there's an opportunity here for like a student.
3
00:00:13.889 --> 00:00:22.110
clyne: intern project that might be the best way to make some traction on some of the more expensive geocache routines.
4
00:00:23.610 --> 00:00:36.510
Supreeth Madapur Suresh: I think the tank great idea john anywhere that soapbox projects for create a two is coming up so be happy to work with you guys and submit a project, we can we can work together on some some project.
5
00:00:37.170 --> 00:00:41.880
clyne: That sounds like a great idea yeah let's think about doing that thanks so pretty.
6
00:00:45.030 --> 00:00:45.990
Mick Coady: Anything else a brief.
7
00:00:50.100 --> 00:00:50.550
Mick Coady: look like.
8
00:00:50.580 --> 00:00:52.020
Supreeth Madapur Suresh: Okay i'm sorry I muted again.
9
00:00:54.000 --> 00:00:55.230
Mick Coady: Thanks appreciate.
10
00:00:55.800 --> 00:00:56.310
Mick Coady: Daniel.
11
00:00:56.430 --> 00:00:57.300
Mick Coady: You were next.
12
00:01:00.450 --> 00:01:00.840
Mick Coady: You mute.
13
00:01:01.680 --> 00:01:02.940
Daniel Howard (CSG, NCAR): My mute button um.
14
00:01:04.350 --> 00:01:14.370
Daniel Howard (CSG, NCAR): My question was on your mentioning like the ncl pivot such and I noticed thing he's on the checkout page primarily using content, etc, which I think is a good move in general.
15
00:01:15.420 --> 00:01:19.560
Daniel Howard (CSG, NCAR): i'm just kind of curious you know shippers, because of course the shame and typically optimization.
16
00:01:20.820 --> 00:01:29.850
Daniel Howard (CSG, NCAR): Other any issues kind of come so far in terms of the year packages that you're going to Canada and as well as the dependencies and those like having optimal.
17
00:01:30.870 --> 00:01:36.360
Daniel Howard (CSG, NCAR): You know, compile packages for fun times on the gpus or or any other environment, for that matter.
18
00:01:39.600 --> 00:01:43.050
clyne: You know i'm sorry do i'm not sure I understood the question.
19
00:01:44.220 --> 00:01:50.910
Daniel Howard (CSG, NCAR): Yes, so um so like I guess like ncl currently primarily runs it's like pip install, which is an also an option with geocache.
20
00:01:51.720 --> 00:02:02.130
Daniel Howard (CSG, NCAR): But you know as of late there's been a lot of transitions towards content and mamba for managing those package installations and i'm just curious if if the it's been your experience.
21
00:02:03.060 --> 00:02:16.380
Daniel Howard (CSG, NCAR): The kind of packages you're installing and using in that ecosystem are running so far effectively like optimally on the on the gpu systems, because you know every now and then sometimes you know Python back just get get get installed kind of some optimally.
22
00:02:17.550 --> 00:02:21.720
Daniel Howard (CSG, NCAR): Just in the sense of how the configurations are set to the dependencies and etc.
23
00:02:22.530 --> 00:02:24.840
clyne: got it yeah so I really can't you know.
24
00:02:26.190 --> 00:02:32.070
clyne: The best of my knowledge, the con installations are working just fine, but again we're not we don't have any gpu.
25
00:02:33.210 --> 00:02:38.310
clyne: code in there now none of our none of the existing geocache plaque packages.
26
00:02:39.390 --> 00:02:40.350
clyne: explicitly.
27
00:02:41.610 --> 00:02:44.610
clyne: Make use of the gpus I can only speak to.
28
00:02:44.970 --> 00:02:48.150
Daniel Howard (CSG, NCAR): Oh man I specify the gpus or even just the cpus for now perhaps.
29
00:02:48.780 --> 00:02:59.700
clyne: yeah I don't know that there's been any kind of you know, benchmarking effort that's done to to see what the performance looks like relative to other installations and you know I wasn't even.
30
00:03:00.480 --> 00:03:05.880
clyne: sure that anybody even considered that there might be differences in performance that's a an interesting point.
31
00:03:07.020 --> 00:03:09.420
Daniel Howard (CSG, NCAR): I think just something to think about like you know.
32
00:03:10.680 --> 00:03:21.690
Daniel Howard (CSG, NCAR): separated mentioning to like no optimization type projects, you know i've encountered certain your user issues with like Python packages for like you know it was installed like without making to like certain like blast libraries and.
33
00:03:22.710 --> 00:03:34.380
Daniel Howard (CSG, NCAR): be like five there's like Oh, it takes two hours because I know if you did this, and now about 90 seconds so, and you know, think about Those are the things can be useful in terms of like how those Python packages are back being put together.
34
00:03:35.490 --> 00:03:35.940
clyne: gotcha.
35
00:03:42.750 --> 00:03:43.500
Mick Coady: Yes, pre.
36
00:03:45.060 --> 00:03:46.170
Supreeth Madapur Suresh: yeah, I just wanted to.
37
00:03:47.220 --> 00:04:03.060
Supreeth Madapur Suresh: Talk about the comment that brand dobbins put in, so I think the boarding guidance definitely involves the use case the language of code written in and also the nature of the code, there are a lot of porting avenues, I think.
38
00:04:04.620 --> 00:04:16.050
Supreeth Madapur Suresh: One of i'm just curious are i'm just, I just wanted to start a discussion on how we could standardize this hard to get more fit for reporting guidance.
39
00:04:19.950 --> 00:04:28.620
Brian Dobbins: um yeah so I guess, obviously I have slightly different interests and john Klein i'll specify the john since there's so many now.
40
00:04:29.430 --> 00:04:43.230
Brian Dobbins: I think yeah language is obviously clear, because if you're using c++ you know, maybe kuda is a better option or something, but it seems even just between open ACC and open MP I don't know enough about.
41
00:04:44.280 --> 00:04:49.050
Brian Dobbins: What the differences are and what the advantages of one over the other are if there are any.
42
00:04:49.410 --> 00:04:54.840
Brian Dobbins: And for the hbc paradigms there's things like cocoa so and I remember looking at some of the site parks presentations and.
43
00:04:55.200 --> 00:05:05.040
Brian Dobbins: I you know, there was coco's and dpc plus plus and I just had a lot of questions as to like some of the results looked really good somewhere, a little less concerning and.
44
00:05:05.580 --> 00:05:10.800
Brian Dobbins: But it's also a summer internship so just getting sort of a you know here's the status of where these things look like.
45
00:05:11.130 --> 00:05:29.130
Brian Dobbins: For HP si codes would be used to me, but I think even for john like I said there john Klein that's that's it's more of a you know for vapor it's a different use case so just general guidance would be good, because I, I know only that I don't know enough, so thank you, though.
46
00:05:30.030 --> 00:05:34.230
clyne: yeah even a flow chart would be helpful, you know some kind of decision tree.
47
00:05:35.730 --> 00:05:41.340
clyne: To provide general guidance for you know which direction to take, I think that would be something that would be helpful.
48
00:05:43.500 --> 00:05:58.080
Supreeth Madapur Suresh: Okay, I think yeah that's that's a really good idea, no charge or something of that kind, I think we can work together and come up with one, but I can probably present it in word, you know with Doc or something.
49
00:05:58.770 --> 00:06:00.450
Supreeth Madapur Suresh: So I think that's a great idea.
50
00:06:00.870 --> 00:06:05.430
John Dennis (he/him): Or we can present it in another GT meeting at a later day.
51
00:06:06.330 --> 00:06:07.920
Supreeth Madapur Suresh: yeah but that also sounds good yeah.
52
00:06:08.400 --> 00:06:22.500
John Dennis (he/him): yeah so I had one mark add one more question here, and I think I know the answer to those, but I just wanted to so so the proof, you mentioned that some of GEO cats functions were ported qq in the past.
53
00:06:24.480 --> 00:06:28.020
John Dennis (he/him): john have those been integrated or returned to you.
54
00:06:30.930 --> 00:06:34.950
Supreeth Madapur Suresh: I think it was more in Seattle, rather than the CIO cat.
55
00:06:35.550 --> 00:06:46.770
John Dennis (he/him): Okay, because I mean I think that's that's that would be something I would really like to get going as kind of a pipeline, so that you know, like a student makes a progress on a geocache.
56
00:06:48.000 --> 00:06:53.730
John Dennis (he/him): function or a class of functions and then it kind of feeds into future releases.
57
00:06:56.310 --> 00:07:09.480
clyne: yeah and and I think the geocache team is very well position now to do things like that, I mean the one of the big you know changes from the moving from ncl to geocache has.
58
00:07:10.290 --> 00:07:19.710
clyne: You know geocache has really kind of fully embracing open development, and you know, in particular, embracing continuous integration services so.
59
00:07:21.270 --> 00:07:32.910
clyne: Testing validating changes in the code is now relatively straightforward, so you know if there was an alternate implementation that use the media, so the gpu you know doing.
60
00:07:34.380 --> 00:07:44.880
clyne: demonstrating correctness and performance is pretty straightforward and there's you know, a path for contributors fully documented contributors guy That said, this is how you go about.
61
00:07:46.020 --> 00:07:59.880
clyne: You know, providing new capability so like I think there are again the geocache team is probably a much better position now to accept outside contributions, then you know the ncl team was when supreme.
62
00:08:01.020 --> 00:08:05.520
clyne: was able to you know work with students to get on their experimentation.
63
00:08:06.210 --> 00:08:07.290
John Dennis (he/him): cool Thank you.
64
00:08:10.440 --> 00:08:15.990
Mick Coady: Okay excellent Johnny you still got your hand up but I assume you're done sat.
65
00:08:16.500 --> 00:08:18.210
John Dennis (he/him): Down I just haven't gotten there yet.
66
00:08:18.390 --> 00:08:24.150
Mick Coady: Okay okay anybody else any other questions for john the great.
67
00:08:26.850 --> 00:08:40.710
Mick Coady: Alright, thanks again turn it over to john boss now for talk about the development testing Q john i've made you co host so you should be able to share your screen all right.
68
00:08:41.940 --> 00:08:43.920
John Blaas: just give me one moment.
69
00:09:08.730 --> 00:09:19.770
Mick Coady: Waiting for john bloss john Klein yeah feel free to reach out to me or anybody in our group in consulting whenever you're ready to talk okay.
70
00:09:20.880 --> 00:09:22.050
clyne: will do Thank you again.
71
00:09:22.290 --> 00:09:22.680
Mick Coady: mm hmm.
72
00:09:25.590 --> 00:09:27.270
John Blaas: All right, can everyone see the slide.
73
00:09:28.530 --> 00:09:28.980
Mick Coady: Yes.
74
00:09:29.340 --> 00:09:29.670
alright.
75
00:09:31.290 --> 00:09:38.670
John Blaas: So i'll just skip past the pure scheduling part i'm sure most of you probably.
76
00:09:39.690 --> 00:09:50.190
John Blaas: went to the went talk yesterday, but just a brief overview will be allowing across the missions in the near future.
77
00:09:51.210 --> 00:09:59.580
John Blaas: But as of right now, you can do some cool things like get a Q set of jobs from the other cluster.
78
00:10:01.080 --> 00:10:10.770
John Blaas: And things like that, but what i'd really like to talk to you today about is the gpu Dev QA, so we are putting together a gpu def queue.
79
00:10:11.580 --> 00:10:29.850
John Blaas: which will be available from 8am to 5:30pm Monday through Friday during a normal business hours, you will be able to submit a one child to this queue and have that run any subsequent jobs will either be held or deleted.
80
00:10:31.800 --> 00:10:52.140
John Blaas: But a Maxwell time for these jobs is set for 30 minutes, so these are you know meant, for you know quickly debugging any problematic gpu workflows or doing some tuning, and you can request anywhere from one to four gpus in this queue for a development work.
81
00:10:53.640 --> 00:11:03.870
John Blaas: And right there below, I have a sample a queue so interactive command that would request a a job from this Q.
82
00:11:06.210 --> 00:11:19.320
John Blaas: that's that's about all I have to say about the gpu dedicated hopefully this helps a lot of people with the kind of doing some debugging and tuning up their gpu codes before they submit to.
83
00:11:20.610 --> 00:11:22.050
John Blaas: Any larger runs.
84
00:11:23.280 --> 00:11:24.330
John Blaas: Any questions.
85
00:11:25.590 --> 00:11:27.450
Mick Coady: yeah gentlemen, this.
86
00:11:27.720 --> 00:11:34.320
John Dennis (he/him): yeah so I I i'm aware that i'm a problem with regard to the that this may.
87
00:11:35.550 --> 00:11:41.460
John Dennis (he/him): may help fix so i'm wondering when can I, when you ready for for.
88
00:11:42.960 --> 00:11:43.860
John Dennis (he/him): friendly users.
89
00:11:44.970 --> 00:11:45.600
John Blaas: Right now.
90
00:11:46.110 --> 00:11:47.280
John Blaas: So if you want to.
91
00:11:48.270 --> 00:11:56.280
John Blaas: If you want to try it out, right now, please go ahead and then just let me know if you have any issues so using that Q.
92
00:11:56.760 --> 00:11:58.290
John Dennis (he/him): Okay cool Thank you.
93
00:11:59.460 --> 00:12:00.360
Mick Coady: I do want to.
94
00:12:01.590 --> 00:12:23.640
Mick Coady: mention that these these guidelines are parameters that john listed, you know, a 530 and 30 minute wall clock limits and so those are those were our best we we debated this a lot within HP CD about what these parameters should be and settled on this as a starting point, but if the.
95
00:12:24.660 --> 00:12:35.730
Mick Coady: If the user community, which is you know what you guys represent feel like any of these need to be tweaked in the future, for example, we find the for gpu limit that we've put on.
96
00:12:36.990 --> 00:12:58.020
Mick Coady: In the queue if it if demand warrants will we will look at increasing the number of gpus that will be available and if 30 minutes is too much or not enough, then we know we can revisit that as well, so we're very anxious to for your feedback so yeah brain, then I think you remember.
97
00:12:58.740 --> 00:13:05.400
Brian Vanderwende: yeah just two comments, one I guess it's just a note on what you said mech I agree wholeheartedly but.
98
00:13:06.390 --> 00:13:19.830
Brian Vanderwende: One thing to just keep in mind is that this this uses, one of the four way people use gpu nodes so if we were to add resources to this cute and say enable eight gpus, then it would reduce the amount in the gpu gpu Q.
99
00:13:21.240 --> 00:13:30.060
Brian Vanderwende: During during work hours, so it is, it is a trade off, but I agree with Nick entirely about the you know it being flexible and the other thing I wanted to point out was.
100
00:13:31.500 --> 00:13:46.170
Brian Vanderwende: i've also updated the exact casper script people like using that to get an interactive session if if you request a just using the gpu Dev Q, with the sq it will.
101
00:13:47.430 --> 00:14:00.630
Brian Vanderwende: alter the default settings that it imposes so that it will you know give you basically a 30 minute wall time in a single view 100 gpu on that Q, so that would be another shorter syntax way of accessing this.
102
00:14:03.990 --> 00:14:05.130
Mick Coady: Thanks bro yeah.
103
00:14:09.120 --> 00:14:10.440
Mick Coady: See now I think you were next.
104
00:14:12.900 --> 00:14:18.360
cmille73: um Well, first of all thanks for setting this up.
105
00:14:19.560 --> 00:14:32.880
cmille73: be like a real action that came out of some of our talks these past few weeks so excited to try it and then, my question is I guess i'm just kind of ignorant about how the cues work so.
106
00:14:34.530 --> 00:14:34.980
cmille73: Is there.
107
00:14:36.150 --> 00:14:44.850
cmille73: In Brian might have personally into this, but is there a set amount of resources that are set aside for this development cube I guess how many.
108
00:14:46.350 --> 00:15:07.230
John Blaas: yeah so it's a one for way the 100 node box that's dedicated to this, so you know it's a maximum of four development gpu jobs in the queue that can be running at any given time, of all those jobs are only using one gpu that makes any sense yeah.
109
00:15:07.560 --> 00:15:07.770
yeah.
110
00:15:10.200 --> 00:15:10.620
cmille73: Thanks.
111
00:15:11.730 --> 00:15:21.450
Mick Coady: john can you bless can you address what happens if let's say scene, or some someone were to submit let's say five jobs to the queue due.
112
00:15:21.870 --> 00:15:27.180
Mick Coady: To the other for the other four just go into a queue state or what what happens.
113
00:15:27.660 --> 00:15:45.120
John Blaas: So the first job well go through and get put into the queue but any other jobs submitted after that, while that job is either cute or still running will get deleted within 30 seconds.
114
00:15:45.990 --> 00:15:51.720
Mick Coady: Okay, so not it won't go into a hold state or something like that now okay all right.
115
00:15:53.280 --> 00:16:01.680
Mick Coady: I just wanted to get that cleared up with the with everybody, so there was there are no unpleasant surprises when they try.
116
00:16:03.690 --> 00:16:06.360
Mick Coady: to suppress you had your hand up for a minute.
117
00:16:08.190 --> 00:16:12.210
Supreeth Madapur Suresh: Oh, I think, I just wanted to ask me.
118
00:16:13.980 --> 00:16:19.710
Supreeth Madapur Suresh: criteria, which was chosen like to set the 530 at you kind of answered it is a trial and error so.
119
00:16:22.320 --> 00:16:24.210
Mick Coady: The eight to 530 timeframe.
120
00:16:25.110 --> 00:16:26.280
Supreeth Madapur Suresh: yeah because.
121
00:16:26.520 --> 00:16:31.410
Supreeth Madapur Suresh: After 530 you not become free is that is that the case like.
122
00:16:34.020 --> 00:16:42.600
Mick Coady: Well, I if I try and speak for the collective thinking here, was it We saw this as primarily a primetime.
123
00:16:43.620 --> 00:16:50.850
Mick Coady: Q work working hours in that after that the demand like you said a lot of the other jobs.
124
00:16:51.900 --> 00:17:03.900
Mick Coady: A lot of those resources normally free up for for this kind of work, so it was again our best guess at what would work for the for the greater good.
125
00:17:05.370 --> 00:17:11.580
Mick Coady: And we can you know we can revisit this like Brian and I mentioned any of these.
126
00:17:13.050 --> 00:17:18.930
Mick Coady: sideboards that we put on there we we can revisit based on the feedback will get from you guys.
127
00:17:22.320 --> 00:17:25.920
Mick Coady: yeah john Dennis okay sure super john Dennis.
128
00:17:26.130 --> 00:17:32.700
John Dennis (he/him): So this is maybe a little bit off topic, but you know and video provided us a magic script for MPs startup.
129
00:17:33.780 --> 00:17:36.420
John Dennis (he/him): And we are thinking that that could be actually run.
130
00:17:38.820 --> 00:17:44.550
John Dennis (he/him): As part of the job submission process is that magic script and educated and.
131
00:17:46.950 --> 00:17:49.980
John Dennis (he/him): If you go ha and that maybe that's a better.
132
00:17:51.240 --> 00:17:51.960
John Dennis (he/him): vision.
133
00:17:52.740 --> 00:18:01.380
Brian Vanderwende: yeah now john we know exactly what you're talking about this has kind of been a thing that's I guess on the to do list for us but.
134
00:18:03.090 --> 00:18:04.800
Brian Vanderwende: Just hasn't percolated to the top.
135
00:18:05.370 --> 00:18:05.790
Okay.
136
00:18:09.120 --> 00:18:16.530
Brian Vanderwende: But I would say, if you want to forward that magic script I would be interested to see if it compares to what i'm thinking of.
137
00:18:18.570 --> 00:18:19.320
Brian Vanderwende: Just to make sure.
138
00:18:20.520 --> 00:18:22.020
John Dennis (he/him): yeah I think I think.
139
00:18:23.250 --> 00:18:26.760
John Dennis (he/him): So preconceived has access to the magic script right.
140
00:18:28.470 --> 00:18:32.940
Supreeth Madapur Suresh: yeah I can I can send send it to you send it to everyone in slack.
141
00:18:34.350 --> 00:18:35.490
Brian Vanderwende: sounds great story, thank you.
142
00:18:39.810 --> 00:18:49.590
John Dennis (he/him): But I mean you know we want it to be, we want it to be available to everybody that's on this call right all the GT and all other gpu user so.
143
00:18:52.290 --> 00:18:52.620
Mick Coady: yeah.
144
00:18:53.130 --> 00:18:53.520
yeah.
145
00:18:56.760 --> 00:19:00.450
Mick Coady: Any other comments or questions before we move on.
146
00:19:03.900 --> 00:19:04.440
Okay.
147
00:19:06.570 --> 00:19:07.230
Mick Coady: So i'll.
148
00:19:08.520 --> 00:19:12.360
Mick Coady: Wait to go back to sharing here.
149
00:19:15.960 --> 00:19:37.680
Mick Coady: last item on my agenda here anyway it that's I will, I think, Jeremy and john Dennis I alerted to this last week, but I, we recently added a new table to the casper area of sizzles resource status page.
150
00:19:38.760 --> 00:19:59.670
Mick Coady: For those of you that have maybe taking a look before now what's your you've seen you're familiar with this particular table that shows what's going on in which which Q, you know number of jobs, I modeled it a little bit after how we what we had done on the shiny and cute so.
151
00:20:01.470 --> 00:20:12.600
Mick Coady: If you go as of I think a week or so ago, maybe 10 days ago if you now go to that page you'll see that area looks more like this, all right.
152
00:20:13.290 --> 00:20:24.750
Mick Coady: With Nick with the new table showing the status of the V 100 notes i'm going to move on and show you blow that up a little bit so it's not so much of an eye test.
153
00:20:25.710 --> 00:20:34.980
Mick Coady: And the attempt to or the thinking here is that it will show what's going on with the available the 100 note, since those are.
154
00:20:36.150 --> 00:20:37.920
Mick Coady: have been a.
155
00:20:40.080 --> 00:20:45.240
Mick Coady: point of contention for resources right we're well aware that there.
156
00:20:46.260 --> 00:20:50.820
Mick Coady: The demand has outstripped the availability of resources of late.
157
00:20:51.840 --> 00:20:52.440
Mick Coady: So.
158
00:20:53.820 --> 00:21:11.970
Mick Coady: This is just to help you and sizzle you know the consulting and SIS admins to get a better idea of what's going on, when we start getting questions slash complaints about the availability of the nodes so.
159
00:21:13.500 --> 00:21:24.330
Mick Coady: This hopefully you'll find this interesting i've gotten feedback from JEREMY that I think JEREMY if I interpreted your feedback, it was it was positive, so.
160
00:21:25.350 --> 00:21:33.510
Mick Coady: If you have any any other like oh it'd be nice if it did such and such or if it included more information.
161
00:21:34.740 --> 00:21:46.380
Mick Coady: i'm i'm all ears on that the one thing I tried to add that I just it was going to take too long, I couldn't quite get my head around the problem was how to.
162
00:21:47.670 --> 00:21:58.890
Mick Coady: Show which, if any of the nodes for or any of the gpus themselves were reserved i'll continue to look on look into that and see if I can find a.
163
00:22:00.300 --> 00:22:03.630
Mick Coady: nice way to add that information in so.
164
00:22:06.270 --> 00:22:09.840
Mick Coady: Any any questions or feedback any other feedback.
165
00:22:12.540 --> 00:22:13.260
Mick Coady: yeah Brian.
166
00:22:14.430 --> 00:22:16.950
Brian Vanderwende: yeah I suppose I could take this offline to but.
167
00:22:18.060 --> 00:22:27.690
Brian Vanderwende: Is the gpu death node listed here, and if it is perhaps we should make it, you know identified as such.
168
00:22:27.960 --> 00:22:39.300
Mick Coady: Oh that's excellent that's great point I hadn't hadn't thought of that yes, it is listed here the what you see here all the V 100 nodes I think 10 of them.
169
00:22:40.470 --> 00:22:41.160
Mick Coady: So.
170
00:22:44.460 --> 00:22:49.620
Mick Coady: Let I will I will work on that, but that's that's a really good suggestion.
171
00:22:52.290 --> 00:23:03.810
Mick Coady: If you the state column there it'll it'll show down in in red if one of those nodes is offline for, for whatever reason.
172
00:23:05.400 --> 00:23:07.020
Mick Coady: The only thing I could think of that.
173
00:23:08.340 --> 00:23:09.390
Mick Coady: To add to this.
174
00:23:10.620 --> 00:23:19.710
Mick Coady: But couldn't come up with a good way to do it was to show nodes or particular gpus that were reserved so.
175
00:23:27.750 --> 00:23:28.440
Mick Coady: yeah JEREMY.
176
00:23:30.570 --> 00:23:40.740
Jeremy Sauer: Thanks rick Indeed, I think this is a nice little resource um you know, one could show in there, I think I think one thing that I certainly um.
177
00:23:41.640 --> 00:23:53.250
Jeremy Sauer: I don't know I hate to call it a complaint, but I certainly see it is when people usually it's a new user to the V 100 nodes it'll be a username you know we haven't seen before.
178
00:23:54.060 --> 00:24:04.110
Jeremy Sauer: They just you know right it's submitted job and they get a note and then the gpu will sit there flatlining on utilization so right there occupying one of these precious few be 100 words.
179
00:24:05.730 --> 00:24:07.290
Jeremy Sauer: And not really doing anything.
180
00:24:09.060 --> 00:24:11.040
Jeremy Sauer: So I don't know if you might think about.
181
00:24:12.450 --> 00:24:27.660
Jeremy Sauer: I don't know if it makes sense to add to this table something of like an average gpu utilization and if if you were saying something of a zero utilization for a period of time, maybe that's overkill in terms of information, but I have to say, recently.
182
00:24:28.770 --> 00:24:38.880
Jeremy Sauer: things seem much improved the schedulers a lot better about packing single gpu jobs and and keeping full eight way nodes open, for you know.
183
00:24:39.510 --> 00:24:53.070
Jeremy Sauer: Actual jobs that can use eight gpus at a time or more so, so I think there's been a real nice improvement on things, and things are probably about as good as they're they're going to get before dorito arrives.
184
00:24:54.210 --> 00:25:08.670
Jeremy Sauer: And just one more comment or I guess thought is, I know that some people get frustrated because they're there are actual production science jobs occurring on these gpus and.
185
00:25:09.360 --> 00:25:16.770
Jeremy Sauer: it's hard to find a gpu to develop on during work hours, does the the gpu Dev to.
186
00:25:17.640 --> 00:25:28.050
Jeremy Sauer: Do we have an idea of how the p one hundreds are being utilized because I guess from fast eddies perspective to be perfectly honest, we developed on P 100.
187
00:25:29.010 --> 00:25:38.580
Jeremy Sauer: We were doing development before V1 hundreds even exists, so why i'm curious why people think they have to have a be 100 to do gpu development.
188
00:25:42.270 --> 00:25:45.570
Mick Coady: it's a great question I turn that over to the group.
189
00:25:47.010 --> 00:25:48.660
Mick Coady: I don't have a good answer for you.
190
00:25:53.010 --> 00:26:07.950
Davide Del Vento: I don't have any answer to the second one, and I think it's just a probably people, not knowing about that option, and maybe you know we advertising that a lot of gpus for visualization.
191
00:26:09.450 --> 00:26:11.430
Davide Del Vento: But instead of on the first one.
192
00:26:13.440 --> 00:26:19.290
Davide Del Vento: I think that's an overkill information to provide and the stable, which is user facing.
193
00:26:20.520 --> 00:26:30.900
Davide Del Vento: But maybe we should do that and our internal monitoring and you know actually monitor many other things, and maybe reach to those users privately.
194
00:26:31.350 --> 00:26:44.010
Davide Del Vento: Like we do for other things that as far as I know we don't do that at the moment and correct me or somebody maybe jump last or somebody else in system, a frame mistaken there.
195
00:26:46.530 --> 00:26:47.430
Ben Matthews: They collect that data.
196
00:26:50.130 --> 00:26:52.320
Mick Coady: Can you say that again Ben I didn't catch it.
197
00:26:53.580 --> 00:26:56.910
Ben Matthews: We do collect up utilization yeah I don't know.
198
00:26:57.990 --> 00:26:59.190
Ben Matthews: Anything but it's available.
199
00:26:59.970 --> 00:27:09.900
Davide Del Vento: Right, but what i'm saying you know if you see something odd with memorization or something else you tell us and we reach out to people and say you know what are you doing, and we do that.
200
00:27:10.380 --> 00:27:27.870
Davide Del Vento: But we don't do that for gpu or I haven't seen it maybe hasn't happened when I am on duty so that's what i'm asking as a final so I know you collect everything, but I do we use it for this purpose, maybe we don't if we don't maybe we should start.
201
00:27:30.420 --> 00:27:33.450
Ben Matthews: routinely defining we, it is a challenge.
202
00:27:35.430 --> 00:27:44.460
Mick Coady: yeah i'd say we're been, in particular, is doing a does a very good job of collecting the data, but as a group.
203
00:27:45.690 --> 00:27:54.240
Mick Coady: We don't do I think of a good job of monitoring usage through the day and I think that's what is the.
204
00:27:55.980 --> 00:28:01.620
Mick Coady: The thorn in jeremy's and other you know heavy you know our power users.
205
00:28:02.820 --> 00:28:14.400
Mick Coady: In their side, and this is this is stable is an attempt to start giving us that kind of tool those kinds of tools to do that.
206
00:28:16.440 --> 00:28:31.830
John Blaas: we're also looking at ways to capture utilization of the gpus throughout the runtime have a job and start putting the information into the accounting records as part of our preparation for direct show as well.
207
00:28:32.760 --> 00:28:43.200
John Blaas: So it, we are working on it and you know, improving the way in which we act on that data, but at this point it's still a work in progress yeah.
208
00:28:43.860 --> 00:28:48.060
Mick Coady: that's really good point john in to.
209
00:28:49.530 --> 00:28:54.990
Mick Coady: I think, to help partially address one of the jeremy's comments.
210
00:28:56.670 --> 00:29:18.870
Mick Coady: Right now, and still to this day there's no penalty for user to grab a gpu and that use it, they don't get there there's no charging for it right so there's no there's no perceived cost to them, although there is to the rest of the user community and we're working actively with.
211
00:29:19.950 --> 00:29:25.050
Mick Coady: The group that maintains Sam to to change that so that.
212
00:29:27.570 --> 00:29:33.780
Mick Coady: so that they do get charged, so that you, you know, as you know, if you're using the resources, you should get charged.
213
00:29:35.100 --> 00:29:37.470
Mick Coady: And then also that'll help us monitor the.
214
00:29:38.610 --> 00:29:39.660
Mick Coady: problem users.
215
00:29:43.680 --> 00:29:57.810
Jeremy Sauer: Actually, could just say one more thing I mean just to follow up again and reiterate again i'm super pleased with some of the adjustments to the schedulers trying to pack nodes and things like that, and I think on this zero utilization.
216
00:29:59.310 --> 00:30:06.780
Jeremy Sauer: type of situation it's it's it's not all that common of an occurrence, but it can hit some times at.
217
00:30:07.980 --> 00:30:14.310
Jeremy Sauer: You know inopportune moments, for example, you know we get into the weekends and some folks are trying to do.
218
00:30:15.300 --> 00:30:22.170
Jeremy Sauer: You know, a bunch of runs and and if you know 28 gpus are taken up by a new user who.
219
00:30:22.800 --> 00:30:28.770
Jeremy Sauer: who's fun seven for gpu jobs that are doing nothing it's like Oh, but what do we do it's the weekend right, you know.
220
00:30:29.190 --> 00:30:40.320
Jeremy Sauer: So um I don't think it's a common case, I think we as a user community can help to monitor, probably in in the interim, as opposed to you know.
221
00:30:41.100 --> 00:30:54.300
Jeremy Sauer: Putting it all on sisal gpu users can can can pay attention pay attention to the queue see see how things are being used and and you know just try and provide feedback where possible and, hopefully, you know we'll just.
222
00:30:55.530 --> 00:31:10.380
Jeremy Sauer: wind up in in you know the best situation possible where the resources are being as effectively utilized as possible, I think you know, the current status of the table there on the slide is pretty good 63 out of 64 gpus and uses.
223
00:31:11.070 --> 00:31:18.540
Jeremy Sauer: yeah there were many a day when that was not the case, so be nice good stuff good thanks for all the efforts yeah yeah.
224
00:31:18.840 --> 00:31:27.540
Mick Coady: A lot of lot of that those changes to the schedule, when you can think john boss for help bit from Ben and the rest of their colleagues so.
225
00:31:28.770 --> 00:31:32.220
Mick Coady: it's it's it's it's a process right, not an event.
226
00:31:34.200 --> 00:31:36.180
You improve these so.
227
00:31:38.520 --> 00:31:54.090
Mick Coady: that's all I had for today if anybody has anything else, they want to bring up talk about be glad to stick around with you for a little bit and otherwise what i'll stop sharing here in.
228
00:31:55.830 --> 00:31:59.190
Mick Coady: Delivery oh yeah john Dennis sorry didn't see your hand up.
229
00:32:00.150 --> 00:32:02.640
John Dennis (he/him): Why, I just raised it so.
230
00:32:04.290 --> 00:32:11.940
John Dennis (he/him): I was, I was in a meeting earlier in this week or last week and we are discussing how to use P cast and I felt like it was a meeting with.
231
00:32:12.870 --> 00:32:24.540
John Dennis (he/him): Three or four expert gpu users and there was confusion about how to properly use P cast, so you know, and we were we were like.
232
00:32:24.960 --> 00:32:35.820
John Dennis (he/him): i'm following the directions that were valid three years ago, and then they're like oh those been taken down and it's completely done differently now so i'm i'm curious.
233
00:32:38.490 --> 00:32:48.360
John Dennis (he/him): about this group, do you know what P cast is you feel comfortable using it is this may be a topic for for a future future meeting.
234
00:32:54.300 --> 00:32:57.060
Mick Coady: i'll i'll admit that I don't know what gases.
235
00:32:57.810 --> 00:32:58.470
Okay.
236
00:32:59.790 --> 00:33:06.030
Mick Coady: I don't know that i'm a represented i'm very representative of the power users in this group that.
237
00:33:07.560 --> 00:33:09.780
Mick Coady: At least try and get that conversation going here.
238
00:33:10.710 --> 00:33:15.360
John Dennis (he/him): So maybe maybe there's not a whole lot of knowledge about P cast and.
239
00:33:17.430 --> 00:33:20.910
John Dennis (he/him): outside of the group of people that actually didn't know how to use it.
240
00:33:23.760 --> 00:33:24.000
Daniel Howard (CSG, NCAR): So.
241
00:33:26.130 --> 00:33:34.740
Daniel Howard (CSG, NCAR): that's basically it in the background, I guess, doing like that article function and they said that if you do have some time was that a other features that are available to be cast.
242
00:33:35.490 --> 00:33:44.490
John Dennis (he/him): Well there's yeah there's three different ways to run P cast one is the auto compare and then there's then there's other features on how to how to run it so.
243
00:33:48.780 --> 00:33:49.680
Mick Coady: unless someone.
244
00:33:50.850 --> 00:33:51.240
Mick Coady: Has.
245
00:33:52.980 --> 00:33:58.530
Mick Coady: Other thoughts or objections, I think it sounds like a good candidate for a future meeting.
246
00:33:58.860 --> 00:34:08.490
John Dennis (he/him): Okay yeah because because basically P cass allows you to do porting work and check correctness incrementally and so it's highly useful.
247
00:34:10.050 --> 00:34:16.440
John Dennis (he/him): To to basically find your bugs if you understand what will help he guesses operating.
248
00:34:17.880 --> 00:34:23.070
Irfan: Also, Nick if it's helpful, we can set up or schedule ask and video.
249
00:34:24.090 --> 00:34:33.480
Irfan: To talk about it because I know they have some a bunch of slides on on it and how it can generate the compiler code for both cpu gpu.
250
00:34:34.350 --> 00:34:35.790
Mick Coady: Yes, you.
251
00:34:35.880 --> 00:34:37.470
Mick Coady: Good I do things.
252
00:34:37.830 --> 00:34:41.910
John Dennis (he/him): and sets a priest mentioned that there is, there is a section in the tutorial.
253
00:34:43.470 --> 00:34:45.690
John Dennis (he/him): gpu tutorial that they put together but.
254
00:34:47.190 --> 00:34:48.150
John Dennis (he/him): Anyway, I.
255
00:34:49.500 --> 00:34:52.320
John Dennis (he/him): it's it's it's a highly useful tool Aaron.
256
00:34:54.750 --> 00:34:59.070
John Dennis (he/him): And, but i've been but i've been confused by it frequently so.
257
00:35:00.870 --> 00:35:08.520
Mick Coady: Well, I think, to me that's totally in keeping with our community of practice the direction right.
258
00:35:08.850 --> 00:35:16.380
Mick Coady: Yes, so if you found it useful than others might as well yeah I bet they would so.
259
00:35:16.950 --> 00:35:28.920
Irfan: yeah Thank you john I think this is exactly why we have cheeky to bring these kind of ideas, so we can find the necessary resources to answer your questions and crazy so.
260
00:35:29.460 --> 00:35:42.150
Irfan: Maybe the next deep and video meeting Brian and make we can bring this up with them as well, and maybe to pretend seen also have some information that they can share the next meeting.
261
00:35:43.290 --> 00:35:43.620
Mick Coady: yep.
262
00:35:44.610 --> 00:35:45.540
Supreeth Madapur Suresh: yeah I think we.
263
00:35:45.840 --> 00:35:51.510
Supreeth Madapur Suresh: covered peak cast in the gpu tutorial, but it was all.
264
00:35:52.650 --> 00:36:04.500
Supreeth Madapur Suresh: I think you guys are aware of it, because all it was a lot of information in a short amount of time, so we'd be happy to go over them again to explain different kinds of speakers and how to use some of the code.
265
00:36:05.700 --> 00:36:22.440
Mick Coady: Thanks surprised yeah I am sure that my brain was fooled by the time we got to there so don't like I said don't take my word for it, that no one else is familiar with it i'm not probably not very representative yeah Sina.
266
00:36:22.710 --> 00:36:35.070
Irfan: So sorry, one thing I saw some latency now, would you be open to maybe doing a 1015 minute just refresh at the next GT GT and then we can dive you have a deep dive later on.
267
00:36:36.870 --> 00:36:40.200
Supreeth Madapur Suresh: yeah absolutely yeah we can yeah yeah.
268
00:36:40.470 --> 00:36:50.250
cmille73: um, but I think one of the issues that so pete and I were running into and we're helping john with the cast as well, as you know, we've been using it for a few years, and we feel.
269
00:36:50.640 --> 00:37:06.390
cmille73: I have felt fairly comfortable with it, but we did find with them most recent compiler changes that they've really changed their documentation, so I really like the idea is follow are fond of having nvidia share.
270
00:37:07.560 --> 00:37:09.270
cmille73: The updates to the software.
271
00:37:12.570 --> 00:37:15.180
Irfan: Thank you for the feedback appreciate it you.
272
00:37:15.450 --> 00:37:15.870
Seen.
273
00:37:18.780 --> 00:37:20.670
Mick Coady: Thomas your next.
274
00:37:23.190 --> 00:37:33.480
Thomas Hauser: Thanks yeah I think it's the way of I don't know Community I mean again this community of practice is there a way we can create something where.
275
00:37:34.530 --> 00:37:43.170
Thomas Hauser: I don't know what the right tool is kind of a database of best practices, I mean it means please just share this one script.
276
00:37:44.610 --> 00:37:58.290
Thomas Hauser: Around kind of the sample script in slack, but I think not everybody is doing this, the way that these things can be contribute and people can look at it, or i'm not sure what the best way of is kind of.
277
00:37:59.670 --> 00:38:02.880
Thomas Hauser: Creating these best practices in our Community of practice.
278
00:38:07.170 --> 00:38:08.220
Mick Coady: yeah like.
279
00:38:09.780 --> 00:38:11.190
Mick Coady: I guess a portal or.
280
00:38:12.240 --> 00:38:14.190
Mick Coady: A repository of all this information.
281
00:38:15.390 --> 00:38:17.670
Thomas Hauser: yeah people can just add.
282
00:38:18.390 --> 00:38:27.810
Thomas Hauser: More content and then maybe it is a your review isn't this already over could, in my opinion, just people add these scenes hate is.
283
00:38:29.250 --> 00:38:33.870
Thomas Hauser: The script and it helps me doing better mapping of gpus.
284
00:38:34.950 --> 00:38:35.310
Mick Coady: hmm.
285
00:38:40.290 --> 00:38:40.890
Mick Coady: We could.
286
00:38:42.900 --> 00:38:48.900
Mick Coady: Use the wiki homepage for this group that i've set up it's it's still.
287
00:38:50.280 --> 00:38:55.050
Mick Coady: need some more meat on those bones so something like that would be I think useful tones.
288
00:38:57.150 --> 00:38:59.820
Mick Coady: It could, it could fit well within that.
289
00:39:02.310 --> 00:39:04.650
Mick Coady: Okay, thanks yeah Daniel.
290
00:39:08.280 --> 00:39:12.540
Daniel Howard (CSG, NCAR): yeah, I just wanted to point out this having recently met with JEREMY on.
291
00:39:13.770 --> 00:39:23.880
Daniel Howard (CSG, NCAR): His fast eddie model and doing some co founders awesome there we were discussing a little bit that I mentioned before, as well, in terms of like a setting up times, perhaps for to.
292
00:39:24.300 --> 00:39:32.640
Daniel Howard (CSG, NCAR): coordinate with other developers who want maybe provide support on the codes and such so if the group here has any input towards.
293
00:39:33.090 --> 00:39:46.170
Daniel Howard (CSG, NCAR): Coordinating such an effort of like offering like a very accessible like a profiling support option that's something at least I might be looking into so feel free to reach out if you have any questions there or thoughts on how to promote that.
294
00:39:50.220 --> 00:39:50.970
Mick Coady: Thanks JEREMY.
295
00:39:53.850 --> 00:39:55.320
Mick Coady: Senior did you have anything else.
296
00:39:58.020 --> 00:39:58.860
cmille73: Oh sorry.
297
00:39:59.820 --> 00:39:59.850
Mick Coady: I.
298
00:40:00.060 --> 00:40:02.760
Mick Coady: Just I just want make sure we caught you did.
299
00:40:04.740 --> 00:40:05.310
Mick Coady: yeah john.
300
00:40:06.060 --> 00:40:09.960
John Dennis (he/him): So so Daniel you, you are you're encouraging a hackathon.
301
00:40:11.580 --> 00:40:14.820
John Dennis (he/him): I think in the in the slack a couple weeks ago.
302
00:40:16.710 --> 00:40:17.220
Daniel Howard (CSG, NCAR): yeah.
303
00:40:18.600 --> 00:40:29.040
Daniel Howard (CSG, NCAR): I mean, I see that was, I think when Brian Pham reading about that that meeting with nvidia with the training team and such and they're also by bringing that the same so we put up a reminder that point.
304
00:40:29.550 --> 00:40:39.840
Daniel Howard (CSG, NCAR): um I think is past the deadline for the wonder back ready to join, but I do recall the mentioning if you still wanted to join, that we could try to reach out to them, to see if they still space, but um.
305
00:40:40.110 --> 00:40:44.880
John Dennis (he/him): I mean dude do we want to do we want to host a hackathon I mean this is maybe something that.
306
00:40:46.080 --> 00:40:49.140
John Dennis (he/him): would happen in spring or fall of next year.
307
00:40:49.770 --> 00:41:04.950
Brian Vanderwende: yeah that was part of the conversation as well john you know we we really an MC asked for this we've gotten a description of the various boot camp and hackathon training offerings that video would help co host.
308
00:41:06.060 --> 00:41:15.720
Brian Vanderwende: And they you know, initially, they said for the next couple of months, you know, here are some events that like do you know, do we partners and things like that that could be open to teams that you have that are interested.
309
00:41:16.170 --> 00:41:21.780
Brian Vanderwende: So the initial steps that we took was before that information to the user community to just be prompt, but I think.
310
00:41:22.860 --> 00:41:28.650
Brian Vanderwende: it's fully our expectation that as thereto gets closer you know, especially after he was here.
311
00:41:29.310 --> 00:41:38.310
Brian Vanderwende: That we would try to host some of those events and they had talked about doing things to like even collaborating on hosting with like know or something so that there's more of a.
312
00:41:38.850 --> 00:41:48.510
Brian Vanderwende: You know community that you know people have different sites to come learn from each other, but we would use our resources so would be more tailored to our user experiences.
313
00:41:49.710 --> 00:41:57.510
Irfan: yeah and that's the point i'd like to emphasize, they did offer to customize a part of the hackathon for our user community.
314
00:41:58.740 --> 00:42:00.360
John Dennis (he/him): I think that would be great.
315
00:42:01.980 --> 00:42:02.970
John Dennis (he/him): I like that idea.
316
00:42:07.380 --> 00:42:07.710
Mick Coady: Good.
317
00:42:08.220 --> 00:42:15.780
John Dennis (he/him): All right, I have another question, and this is maybe an n hug discussion, you know, there was there was.
318
00:42:16.860 --> 00:42:17.700
John Dennis (he/him): There was some.
319
00:42:19.980 --> 00:42:38.130
John Dennis (he/him): Q issues on casper a couple weeks ago where the thing was being flooded and and you know and and we kind of thought that it was like Oh, the cues really backed up, but it was kind of the cube kind of struggling over the particular script.
320
00:42:39.330 --> 00:42:46.620
John Dennis (he/him): I got that by looking on the slack encouraged gpu channel, but of course it was impacting all casper.
321
00:42:48.870 --> 00:42:56.760
John Dennis (he/him): And you know, and I, and I got really prompt response from from from john and Ben on on you know kind of a workaround on that.
322
00:42:57.780 --> 00:43:02.070
John Dennis (he/him): MIC I think as well, so if this announced elsewhere.
323
00:43:05.160 --> 00:43:07.200
Mick Coady: Is what announce deals were job.
324
00:43:07.200 --> 00:43:14.160
John Dennis (he/him): wow I mean you know, a so so when when my gpu jobs weren't going in I thought oh oh it's.
325
00:43:14.610 --> 00:43:23.580
John Dennis (he/him): casper is really busy I didn't think oh it's it's it's the queue system is having issues because of because of a particular user what was that.
326
00:43:24.360 --> 00:43:41.670
John Dennis (he/him): I mean, and you know, and I wouldn't have got that information if I hadn't been on slack and it like asked why my gpu jobs aren't going out so i'm just wondering, is that announced elsewhere, I mean I see your MIC I see or emails that come out but.
327
00:43:43.650 --> 00:43:54.120
John Dennis (he/him): i'm just wondering if if this is announced on like a slack channel if it should be like because because I mean, for example, this was impacting all users of casper.
328
00:43:55.170 --> 00:44:00.270
John Dennis (he/him): So people and see God, for example, we're wondering why want their their analysis jobs going through.
329
00:44:01.590 --> 00:44:06.540
John Dennis (he/him): And, and the people at CD probably did not think to look on the gpu.
330
00:44:08.220 --> 00:44:09.090
John Dennis (he/him): slack channel.
331
00:44:12.720 --> 00:44:20.790
Mick Coady: Right yeah, I guess, to answer your question, I think that that information was probably not widely available to most users.
332
00:44:21.510 --> 00:44:25.620
Mick Coady: Okay it's if that's the kind of the root of your question.
333
00:44:26.850 --> 00:44:31.230
John Dennis (he/him): yeah I mean should I guess i'm asking a question should it be.
334
00:44:33.990 --> 00:44:34.710
Mick Coady: Probably.
335
00:44:35.790 --> 00:44:46.110
Mick Coady: Do you have any suggestions on how best to do that, outside of like no defiers or because these things kind of tend to boil up in the middle of the day, right and.
336
00:44:47.730 --> 00:44:54.780
John Dennis (he/him): I just you know i'm just wondering if there should be like kind of a slack channel that's for general.
337
00:44:56.070 --> 00:45:01.020
John Dennis (he/him): You know casper or general stuff that would say like.
338
00:45:02.280 --> 00:45:05.940
John Dennis (he/him): The Q system on chat or the Q system on casper is.
339
00:45:07.080 --> 00:45:09.810
John Dennis (he/him): undergoing difficulties, we were looking into it.
340
00:45:09.930 --> 00:45:19.950
John Dennis (he/him): Right so, then you would see oh I don't have to put in some kind of a query because people already know that it's going on and it's and it's not that.
341
00:45:21.300 --> 00:45:21.810
John Dennis (he/him): You know.
342
00:45:24.690 --> 00:45:26.250
Mick Coady: Some of these working on that right.
343
00:45:26.280 --> 00:45:28.050
Mick Coady: yeah it's yeah yeah.
344
00:45:30.390 --> 00:45:41.670
Supreeth Madapur Suresh: But the problem I have one on one fix that they could try, one of the things that a try as good as the slack channel in the modifier of the visibility.
345
00:45:42.300 --> 00:45:50.250
Supreeth Madapur Suresh: So that other people from other labs could also join or mostly like shana the casper user could join to the group.
346
00:45:50.610 --> 00:46:10.890
Supreeth Madapur Suresh: And, especially in the general channel, you could actually communicate the details if if casper is experiencing problem you could communicate to everyone in that general channel, so I think we, we could get more users signed up for slack and communicate from there.
347
00:46:13.980 --> 00:46:16.260
Supreeth Madapur Suresh: For today, especially for the real time updates.
348
00:46:17.040 --> 00:46:24.390
Mick Coady: You know yeah well in that we might get more readership or followings on that because we know just from.
349
00:46:25.650 --> 00:46:26.730
Mick Coady: Experience that.
350
00:46:27.870 --> 00:46:38.100
Mick Coady: Many users do not read the daily Bolton, they do not read they they filter the know defiers and then so they miss out on that information so.
351
00:46:39.210 --> 00:46:52.110
Mick Coady: i'm always we're we're keenly interested in finding new are better ways to reach users, especially in cases like what john and you were talking about so.
352
00:46:53.070 --> 00:47:01.860
Mick Coady: yeah I it'd be easy to make put a notice out in the daily Bulletin if people give people read it, and maybe just start getting the word out.
353
00:47:04.200 --> 00:47:05.580
Mick Coady: yeah yeah JEREMY.
354
00:47:07.740 --> 00:47:18.060
Jeremy Sauer: But So are we talking about the other day, when one user submitted like 10,000 plus jobs, and you know basically took out the PBS server on casper is that the one we're talking about.
355
00:47:19.110 --> 00:47:22.440
Mick Coady: I think that's one of the cases we're talking about there's been.
356
00:47:23.760 --> 00:47:23.880
i've.
357
00:47:25.170 --> 00:47:26.250
Mick Coady: kind of lost track of.
358
00:47:27.510 --> 00:47:35.310
Mick Coady: Some of these crisis we've been dealing with recently with both cheyenne and casper getting hammered by.
359
00:47:37.050 --> 00:47:46.740
Mick Coady: users who either had a bug in their script or they didn't realize what they really they're naive or new and didn't realize what they were doing so we've had.
360
00:47:49.320 --> 00:47:53.070
Mick Coady: it's not just one type of problem right so.
361
00:47:54.150 --> 00:47:54.870
Jeremy Sauer: I guess.
362
00:47:55.770 --> 00:47:56.400
Jeremy Sauer: i'm asking.
363
00:47:57.480 --> 00:48:08.670
Jeremy Sauer: You know because I thought I think you know, in the end to notify the pretty good right the notified comes out when something special is going on, or something kind of different is going on and.
364
00:48:09.210 --> 00:48:19.020
Jeremy Sauer: I know at least amongst my colleagues who are all users they're not sizzle people they're all non sizzle people they pay attention to know fires, they see you know to fire, you know they know something's going on now.
365
00:48:19.620 --> 00:48:25.140
Jeremy Sauer: The speed at which a notification can come out is not you know it's not necessarily guaranteed so.
366
00:48:26.730 --> 00:48:38.460
Jeremy Sauer: You know, in my case i'll often you know i'll submit a ticket and if I if I spend three minutes submitting a ticket for something I see you know I figured i'm doing my part to help the the Community hbc user.
367
00:48:38.970 --> 00:48:48.300
Jeremy Sauer: Environment, I think the notion of i'm in the slack channels cool i'm on there, a lot of new stuff I do say things on the slack channel.
368
00:48:49.170 --> 00:49:01.530
Jeremy Sauer: But I think if you're finding people are filtering the daily Bulletin and filtering the notifies they're already feeling like they they get more information than they can digest so sending them a number another form of.
369
00:49:02.910 --> 00:49:06.720
Jeremy Sauer: You know a lot of information might not necessarily.
370
00:49:07.860 --> 00:49:25.230
Jeremy Sauer: You know, lead to any more sort of paying attention or or or interaction with the users at I just want to encourage that I think the notify or is great I love the system there been a couple times when it's been useful to me and and I think, certainly in a few of these cases.
371
00:49:26.400 --> 00:49:38.610
Jeremy Sauer: You know the flooding of the server thing yeah i've submitted a ticket and I figured that's just me doing my part to try and be a responsible member of the user community.
372
00:49:40.410 --> 00:49:43.590
Mick Coady: Good thanks thanks for the feedback yeah Ben.
373
00:49:46.290 --> 00:49:47.820
Ben Matthews: From the system perspective.
374
00:49:48.840 --> 00:50:04.350
Ben Matthews: john i'm glad we can help, but the current membership of the gpu slack is a small number of friendly users, so I at least feel like I have to vet my responses less than I might for a notify or that goes up to three or 4000 people.
375
00:50:05.910 --> 00:50:13.710
Ben Matthews: hey I think if you were to have the same list of people on the slack you would get similarly filtered responses.
376
00:50:15.240 --> 00:50:19.290
Ben Matthews: That makes sense, trying to trying to be diplomatic here but.
377
00:50:22.110 --> 00:50:32.010
Ben Matthews: We have to be careful that the larger Community gets the right story, whereas with you, if I say something wrong, I feel like I can take it back 10 minutes later and say oh here's what really happened.
378
00:50:35.820 --> 00:50:36.720
John Dennis (he/him): Okay well.
379
00:50:37.740 --> 00:50:39.690
John Dennis (he/him): Yes, there, I see that point.
380
00:50:43.980 --> 00:50:54.600
Irfan: I add that, instead of being reactive if there are gaps in communication, I would prefer that we have a strategy behind changing enhancing.
381
00:50:55.020 --> 00:51:16.200
Irfan: Our communication strategy, rather than be reactive to one or two incidents, so the feedback is really good and let's kind of discuss this offline you know because improving our communication strategy is a good thing, but it should be based on some solid strategy, not just react.
382
00:51:17.400 --> 00:51:17.670
Irfan: yeah.
383
00:51:19.170 --> 00:51:23.910
Mick Coady: I agree, otherwise we end up with a beautiful welcome all kind of thing right.
384
00:51:24.930 --> 00:51:25.230
Mick Coady: yeah.
385
00:51:25.320 --> 00:51:32.910
Irfan: But that doesn't mean that you should not provide input I think it's good input it's a valuable input and should be considered.
386
00:51:33.690 --> 00:51:49.710
Irfan: You know how we can take you know some users who may not have read nor the fire, but are more slack or other too friendly, how can we bring them on board, maybe more communication is needed about the note of fire if that's our primary tool.
387
00:51:52.410 --> 00:52:03.330
Mick Coady: It is my primary tool for getting word out, and I think CS geez about these incidences that flare up through through the day.
388
00:52:04.650 --> 00:52:08.130
Mick Coady: Whereas daily bolton's more for more broader.
389
00:52:09.420 --> 00:52:12.360
Mick Coady: In fyi kinds of things.
390
00:52:14.220 --> 00:52:16.500
Mick Coady: It that's how i've been using it anyway.
391
00:52:22.920 --> 00:52:23.430
Mick Coady: Okay.
392
00:52:26.340 --> 00:52:28.350
Mick Coady: All right, well if there isn't.
393
00:52:29.880 --> 00:52:35.700
Mick Coady: If there isn't any more comments in again I can stick around for a little bit, I do have another meeting to run to but.
394
00:52:36.750 --> 00:52:41.460
Mick Coady: If somebody has something else, they want to talk about please, please say so now.
395
00:52:44.610 --> 00:52:45.750
Jeremy Sauer: Sorry, one last thing make.
396
00:52:46.020 --> 00:52:46.320
sure.
397
00:52:47.340 --> 00:52:48.990
Jeremy Sauer: I guess i'm just noticing here.
398
00:52:50.070 --> 00:52:52.380
Jeremy Sauer: seems to be a predominantly sizzle.
399
00:52:54.390 --> 00:52:59.640
Jeremy Sauer: sizzle participation here in GT GT and I guess I would I would.
400
00:53:01.440 --> 00:53:06.720
Jeremy Sauer: anticipate I mean I thought that the idea of GT was to be you know kind of more.
401
00:53:08.220 --> 00:53:27.300
Jeremy Sauer: At well more and not inclusive or on word, but just have representation from other labs more and and I could be wrong, and please do correct me if i'm wrong but, but this this at the at the moment it seems pretty sisal dominated and so, if I could, if I could suggest um you know, a targeted.
402
00:53:29.370 --> 00:53:38.610
Jeremy Sauer: You know, somehow balancing of the inputs here of the thoughts of the you know or just that there's a need there's a need right like I mean.
403
00:53:39.450 --> 00:53:48.240
Jeremy Sauer: there's there's a number of labs all the labs use the hbc facilities and we want all the labs using the gpus and so.
404
00:53:48.930 --> 00:54:04.080
Jeremy Sauer: I just like to encourage that we try and find a way to engage with with more folks I love all the folks who have sizzle This is great great stuff love talking to all of you and discussing with all of you, but I think for incur sake, we need more engagement from other labs to.
405
00:54:05.010 --> 00:54:05.460
Mick Coady: New org.
406
00:54:05.850 --> 00:54:06.870
Jeremy Sauer: My two cents.
407
00:54:06.990 --> 00:54:19.770
Irfan: Thank you, Jeremy for saying that no I that I think is something that's really important, and we have to reach out to a broader audience so make I can definitely, this is a really good feedback.
408
00:54:20.040 --> 00:54:21.990
Mick Coady: yeah yeah I.
409
00:54:23.400 --> 00:54:36.990
Mick Coady: I noticed that when when people were starting show up for the meeting I think Oh, is this going to be all sizzle right and, of course, very glad to see you and others from outside of sizzle join us.
410
00:54:38.040 --> 00:54:53.340
Mick Coady: i'll i'll blame myself for today's for Apps this this way today's meetings waited for not getting out and notice earlier about this and I will I will do better moving forward.
411
00:54:54.390 --> 00:54:54.900
Mick Coady: Plus.
412
00:54:57.690 --> 00:55:04.890
Mick Coady: Those that you know, have been on their part of the GT GT distribution group email group.
413
00:55:06.120 --> 00:55:12.450
Mick Coady: I will reach out to them as well, and remind them of the meetings some.
414
00:55:14.010 --> 00:55:20.760
Mick Coady: Plus, and then we can also solicit more people JEREMY this is doesn't have to be limited to just you from your group right.
415
00:55:22.710 --> 00:55:34.350
Mick Coady: And I noticed, you know Dave Gill was here earlier and other so perhaps expanding the our target audience will help on them.
416
00:55:36.360 --> 00:55:50.280
Daniel Howard (CSG, NCAR): Prior to like next meetings and such a good like solicit from NASA so people are really actually I want a general to like contribute it to the agenda, then they might felt feel I important to discuss the meetings that would make that make that available to be a.
417
00:55:50.760 --> 00:55:52.290
Mick Coady: Good point good idea.
418
00:55:57.510 --> 00:55:58.080
Mick Coady: Okay.
419
00:56:01.830 --> 00:56:10.890
Mick Coady: Well, with that I think sounds like we're done things sounds like we've ran out of not steam, but maybe time for today so.
420
00:56:12.060 --> 00:56:22.410
Mick Coady: With that i'll thank everybody for for showing up for for attending and your great participation and look forward to doing this again next next month okay.
421
00:56:25.140 --> 00:56:26.430
Mick Coady: All right, take care, everybody.
422
00:56:26.730 --> 00:56:27.240
John Dennis (he/him): Thank you.
423
00:56:27.510 --> 00:56:29.370
Mick Coady: thanks John.