1
00:00:00.000 --> 00:00:08.010
Daniel Howard (CSG, NCAR): Our discussion today we can further explore that in terms of just like what might be best best for the Community and that that's kind of pointing to the trying to.
2
00:00:08.429 --> 00:00:17.640
Daniel Howard (CSG, NCAR): gauge the audience needs ahead of time as much as possible and as well as just like further understanding what you know we as Community that have maybe more exposure to gpu computing.
3
00:00:18.510 --> 00:00:35.910
Daniel Howard (CSG, NCAR): Can maybe you know better predict like you know, over time, what might be most worthwhile people's learning experience and time spent if say you know could afford trend might not be worthwhile and long term versus age and then just encourage them instead to focus on open ACC and fortran.
4
00:00:37.140 --> 00:00:38.100
Daniel Howard (CSG, NCAR): Personally I.
5
00:00:39.420 --> 00:00:40.680
Daniel Howard (CSG, NCAR): I find like.
6
00:00:41.790 --> 00:00:47.310
Daniel Howard (CSG, NCAR): A making most researchers that maybe don't have like a computer science background, making them to go through the.
7
00:00:48.150 --> 00:00:57.150
Daniel Howard (CSG, NCAR): slog through like learning kuda all the time is maybe worth their time as much so, like emphasizing the directors based approaches like open SEC.
8
00:00:57.630 --> 00:01:05.220
Daniel Howard (CSG, NCAR): will be most beneficial for them, considering it's likely to allow for significant improvement that meets their needs most of the time.
9
00:01:05.520 --> 00:01:13.980
Daniel Howard (CSG, NCAR): Rather than needing to get defined green improvement in a position to be able to include and could have programming but um yeah like that's a good point to make.
10
00:01:14.490 --> 00:01:14.850
yeah.
11
00:01:18.540 --> 00:01:26.850
Daniel Howard (CSG, NCAR): But, as you may recall there's already a past workshops are led by seen as priests and other student trainees that's already posted archived on github.
12
00:01:27.900 --> 00:01:40.770
Daniel Howard (CSG, NCAR): From the from that session, you know we basically learned to, we need to separate material and more bite sized chunks make things more digestible and and smaller pieces people can then pick and choose say between the archive sessions in terms of what's important for them to learn.
13
00:01:42.060 --> 00:01:45.870
Daniel Howard (CSG, NCAR): Again, the interactive like coding examples that they were beneficial, but we're difficult to coordinate.
14
00:01:46.500 --> 00:02:00.330
Daniel Howard (CSG, NCAR): So we just We just need to plan ahead of time, just like you know gauging and anticipating the significantly longer time this likely to take in terms of setting up such a sessions, and to do the live coding example and demonstrations.
15
00:02:01.350 --> 00:02:09.360
Daniel Howard (CSG, NCAR): And then, of course, I say environments and access to the gpus itself can be tedious expensive sometimes infeasible feed be much larger groups but that'll depend on.
16
00:02:09.660 --> 00:02:22.020
Daniel Howard (CSG, NCAR): See the amount of interest we might see in terms of people signing up to engage with these seminars down the line, but there's already, as I said, there's already a wealth materials out there, you can see a link to links to various kind of like relevant.
17
00:02:23.400 --> 00:02:31.740
Daniel Howard (CSG, NCAR): Resources that people could be learning from terms of like engaging with gpu computing, but we want to be able to not just like you know replicate what is already out there, but.
18
00:02:32.400 --> 00:02:45.000
Daniel Howard (CSG, NCAR): contribute to and build upon them in this, in the sense towards serving specifically the earth science domain, so how we can think about doing that effectively, I think, is really important to discuss amongst this this group as as we were able.
19
00:02:46.590 --> 00:02:59.850
Daniel Howard (CSG, NCAR): But uh so so like, for example, like some recent best practices, there were these were highlighted and so on the the scale computing project webinars this is done by a David victor's l&l he was highlighting their their monetization process and pointed up us.
20
00:03:01.020 --> 00:03:03.000
Daniel Howard (CSG, NCAR): You know that there are specific steps that are often.
21
00:03:04.140 --> 00:03:06.450
Daniel Howard (CSG, NCAR): consistent across APP monetization.
22
00:03:07.710 --> 00:03:21.780
Daniel Howard (CSG, NCAR): Taxes they they've observed and we want to make sure that each of these steps you know that there's a definitive course of action and how, which you know developers and scientists can approach each step, such that they they know how to.
23
00:03:22.860 --> 00:03:29.460
Daniel Howard (CSG, NCAR): approach that and perform that optimization techniques with them up computing space and or, if not least know how to.
24
00:03:29.790 --> 00:03:38.250
Daniel Howard (CSG, NCAR): access, materials and resources that can help them learn how to do so, so that you know you see, for example on the right side of the screen there's it took like four years for some of the projects to.
25
00:03:38.760 --> 00:03:50.430
Daniel Howard (CSG, NCAR): To go through that porting process and we want to make sure you know, trying to you know minimize the amount of time it takes to do some these pointing policies, especially for these long historical codes and maybe have strong interest to putting on gpus.
26
00:03:52.170 --> 00:03:58.740
Daniel Howard (CSG, NCAR): So, in any case there's lots of projects that already incorporate or or will be using gpus and the usage and deployment.
27
00:03:59.790 --> 00:04:02.850
Daniel Howard (CSG, NCAR): Many over here and cars will, as well as elsewhere.
28
00:04:03.870 --> 00:04:05.310
Daniel Howard (CSG, NCAR): we're considering through the.
29
00:04:05.370 --> 00:04:14.490
Daniel Howard (CSG, NCAR): Through the seminars to coordinate with some of these projects and find like set perhaps specific kernels we can extract as like an example.
30
00:04:15.420 --> 00:04:24.300
Daniel Howard (CSG, NCAR): Like models, we can highlight within these seminars That, then, can be part of the learning experiences within the training, you know, keeping people interested in terms of a specific there's science, education.
31
00:04:25.080 --> 00:04:29.790
Daniel Howard (CSG, NCAR): But that can be very engaging for many participants in terms of like learning the material.
32
00:04:31.050 --> 00:04:38.100
Daniel Howard (CSG, NCAR): And so, like basically we want to highlight figure out ways that we can always learn something is advanced Sunday about objects with respect of best practices and.
33
00:04:38.460 --> 00:04:51.150
Daniel Howard (CSG, NCAR): reach out to somebody development teams in terms of what's works best what works best for them and how we can maybe you know translate that into a way in which we can teach that to other other up and coming gpu developers within the science community.
34
00:04:52.380 --> 00:05:00.060
Daniel Howard (CSG, NCAR): So you know questions remain in that space in terms of are those needs that we're trying to meet up are they more the technical variety in terms of like know the actual stuff engineering and.
35
00:05:00.420 --> 00:05:12.000
Daniel Howard (CSG, NCAR): Implementing you know the software and code or is it like access to hardware, as I O bottlenecks or other stuff apart softer problems like a version control or software lifecycle management or other things like that.
36
00:05:13.140 --> 00:05:21.960
Daniel Howard (CSG, NCAR): We also want to make sure to be able to be able to identify a clear metrics to identify which scientific problems and earth sciences, will benefit from gpus and help us be able to.
37
00:05:22.470 --> 00:05:28.770
Daniel Howard (CSG, NCAR): discern that poem for themselves it's not always the case that a specific problem can benefit from gpus and.
38
00:05:29.220 --> 00:05:34.800
Daniel Howard (CSG, NCAR): helping people be able to answer that question effectively so that they're not wasting time exploring that space that that's a useful skill to.
39
00:05:35.250 --> 00:05:49.830
Daniel Howard (CSG, NCAR): At least, develop and you know at least inform users of how to approach that and then lastly just you know, try and differentiate between the needs of, say, the physics based gpu computing community as well as them within the artificial intelligence machine learning GP computing.
40
00:05:51.030 --> 00:05:59.430
Daniel Howard (CSG, NCAR): Like domains, and what what sort of different needs are present in both those and where they intersect and where they don't intersect and trying to serve that as best as possible.
41
00:06:00.720 --> 00:06:07.020
Daniel Howard (CSG, NCAR): So again, like here's a survey so far i've already presented this twice at other places.
42
00:06:07.710 --> 00:06:13.110
Daniel Howard (CSG, NCAR): I don't know if you're like, if you want concealing what the non gpu focus community has kinda like already.
43
00:06:13.710 --> 00:06:24.810
Daniel Howard (CSG, NCAR): commented, you know the fortran actually wins out not as much C C or c++ kind of like a programmers but there's actually a large amount of Python interest in terms of like that space and.
44
00:06:25.560 --> 00:06:31.230
Daniel Howard (CSG, NCAR): Further down responses indicate, at least, it says sets of interest so far with a small sample size on.
45
00:06:32.370 --> 00:06:42.300
Daniel Howard (CSG, NCAR): Python computing and gpu computing so that that's we already have, like some detailed responses in terms of the open ended questions that we can maybe start to look do now, but like.
46
00:06:42.660 --> 00:07:00.180
Daniel Howard (CSG, NCAR): Honestly, might be useful to wait till we haven't bought a representative survey to actually analyze these feedback so far, but i'm happy to say, go into this further and we can spend time on this now, during today or wait till like a next meeting, another time but uh.
47
00:07:01.440 --> 00:07:09.180
Daniel Howard (CSG, NCAR): yeah like a you know kuda programming with Python, for example, is winning out even just like could have programming in fortran just also already waiting on and see.
48
00:07:09.510 --> 00:07:17.370
Daniel Howard (CSG, NCAR): Like that's just what people would want to learn whether or not that's maybe most effective for them to be learning that's maybe your question that maybe we can gauge in terms of.
49
00:07:18.090 --> 00:07:25.590
Daniel Howard (CSG, NCAR): Where that might be appropriate to maybe you know shift people in a different direction as as john was referring to earlier.
50
00:07:27.510 --> 00:07:33.870
Daniel Howard (CSG, NCAR): But hopefully it's within you know the next little talk coming up, but you know verifying credits you up to this under P cast so.
51
00:07:34.290 --> 00:07:43.110
Daniel Howard (CSG, NCAR): there's obviously a strong interest in bad to concern within the Community, in terms of when implementing a gpu code in terms of a there's a stronger interest there slightly.
52
00:07:43.920 --> 00:07:56.880
Daniel Howard (CSG, NCAR): As again with a small sample size towards addressing that particular concern but uh yeah I think that's only half today like have in this presentation, so far, but be happy to discuss anything have.
53
00:07:57.510 --> 00:08:12.120
Daniel Howard (CSG, NCAR): presented so far and more details and just terms of like trying to come up with the best approach for, as we know it, despite say starting this training series about January, I think, is what we're aiming for.
54
00:08:13.890 --> 00:08:23.550
Daniel Howard (CSG, NCAR): Most people are suggesting that the same just considering many of the campuses and and other obligations and holidays coming up through November and December so.
55
00:08:24.870 --> 00:08:27.480
Daniel Howard (CSG, NCAR): thanks again and happy to take any feedback.
56
00:08:31.050 --> 00:08:31.590
Mick Coady: You know.
57
00:08:33.120 --> 00:08:46.230
Mick Coady: emphasize what one of the things Daniel was saying, if any of you have not yet completed that survey, please do so within this group, if nothing else is should be kind of leading the way with.
58
00:08:47.910 --> 00:09:01.770
Mick Coady: How we move forward and what we consider so we're keenly interested in in your feedback and it's not limited, of course, to the survey right we've got this meeting and.
59
00:09:02.340 --> 00:09:15.330
Mick Coady: remind you, you can always reach out to to us individually or in the GT email group which i'll post in the chat here in just a minute yeah yeah john got your hand up.
60
00:09:15.690 --> 00:09:19.350
John Dennis (he/him): I was not aware that CIS had this capability.
61
00:09:20.400 --> 00:09:24.810
John Dennis (he/him): Could you could you pass on the contact information or the word you got the.
62
00:09:26.220 --> 00:09:32.580
Daniel Howard (CSG, NCAR): uh yeah like I I may be mistaken, as I think remember like seeing some of that development is happening, while I was that.
63
00:09:34.230 --> 00:09:40.860
Daniel Howard (CSG, NCAR): Of nfl nfl but i'll have to double check that but yeah i'll follow up and let you know.
64
00:09:41.550 --> 00:09:42.000
John Dennis (he/him): Thank you.
65
00:09:44.820 --> 00:09:45.150
Mick Coady: yeah.
66
00:09:47.610 --> 00:09:48.180
Mick Coady: Sorry.
67
00:09:49.440 --> 00:09:56.550
Supreeth Madapur Suresh: yeah in the same lines Wharf the the the tempo questor have a version of kuda Wharf.
68
00:09:57.450 --> 00:10:10.440
Supreeth Madapur Suresh: But there's a there's another version of openness, see well with it, which is very older version, but other than that I don't i'm not at least i'm not aware of any other development in the gpu space for work.
69
00:10:11.520 --> 00:10:15.990
Supreeth Madapur Suresh: Along with that, like actually I wanted to.
70
00:10:17.460 --> 00:10:33.030
Supreeth Madapur Suresh: Maybe you're going to cover it or maybe you already covered it, I just wanted to bring up this topic of asking the Community to be the presenters in the gpu workshop and also like, if you have the specific quotes that you are working on.
71
00:10:34.740 --> 00:10:41.670
Supreeth Madapur Suresh: can regenerate some examples from that, in that sense, Community and then present it in in the workshop series.
72
00:10:47.310 --> 00:11:02.130
Daniel Howard (CSG, NCAR): yeah I think that's a appropriate in terms of like a somebody asks I was suggesting earlier in terms of reaching out to the some of the efforts, whether they're successful or not yet in terms of the war for fast eddie or whatever other projects, not be relevant towards you know.
73
00:11:03.150 --> 00:11:11.310
Daniel Howard (CSG, NCAR): Contributing perhaps to talk as part of the series later on, it will be one of the later ones transmit perhaps a more advanced session, but perhaps some be useful to.
74
00:11:11.820 --> 00:11:18.810
Daniel Howard (CSG, NCAR): attendees and seeing like real life examples and how others have approached their their gpu computing refactoring efforts, for example.
75
00:11:22.140 --> 00:11:22.290
yeah.
76
00:11:25.080 --> 00:11:34.350
Davide Del Vento: yeah just wanted to mention that as far as I know, the temple quest war fees actually mostly if not everything open SEC not kuda.
77
00:11:35.640 --> 00:11:43.500
Davide Del Vento: Now that screws source, so we can know for sure exactly what it does, but but that's my understanding for the conversation I had the developers.
78
00:11:44.520 --> 00:11:45.660
Supreeth Madapur Suresh: Oh uh.
79
00:11:47.670 --> 00:11:56.190
Supreeth Madapur Suresh: Okay, I can I can check that and convert because, as far as I was aware, it was good, it was entirely written in kuda not open SEC, but.
80
00:11:56.910 --> 00:12:00.960
Davide Del Vento: After include, as I said it better, but as far as I know, it's mostly open SEC.
81
00:12:02.010 --> 00:12:08.100
Davide Del Vento: But I mean if you know differently, and I will definitely let me know because i'm interesting going on holidays.
82
00:12:10.350 --> 00:12:13.110
Mick Coady: i'm not sure how much that will factor into our.
83
00:12:14.970 --> 00:12:15.690
Davide Del Vento: Noise just.
84
00:12:16.230 --> 00:12:19.200
Davide Del Vento: You know, for understanding what they're doing and what they have.
85
00:12:19.200 --> 00:12:22.830
Davide Del Vento: done, and so what is possible instead asked mean but, again, it will be.
86
00:12:24.360 --> 00:12:29.640
Davide Del Vento: I don't know if I beat beat for beat but if I needed to be same result sauce or itself.
87
00:12:31.440 --> 00:12:31.980
CP one.
88
00:12:34.470 --> 00:12:35.430
Mick Coady: Yes, Jeremy.
89
00:12:38.880 --> 00:12:40.170
Jeremy Sauer: A bowl things.
90
00:12:41.580 --> 00:12:44.640
Jeremy Sauer: Well, I guess it's interesting this this survey is.
91
00:12:45.690 --> 00:12:52.560
Jeremy Sauer: It looks very, very similar to a census or survey I guess it was called the.
92
00:12:54.180 --> 00:13:08.220
Jeremy Sauer: In car software development and application census that the excess scale tiger team put together and disseminated and which had about 124 respondents across the various labs at in-car.
93
00:13:08.760 --> 00:13:20.070
Jeremy Sauer: There looks to be some I mean maybe redundancies, not the right word, but some similarity in many of the questions this survey does seem to focus.
94
00:13:21.630 --> 00:13:34.770
Jeremy Sauer: You know right right out the gate on gpus and how do you want to use gpus and what kind of training for gpus would you look for and so on, so forth, but there may be, I only bring it up to say there may be some interesting.
95
00:13:35.880 --> 00:13:50.820
Jeremy Sauer: correlations between the two information gathering exercises um I don't know how many respondents you have on this survey so so far, or if you have a sense of where the respondents.
96
00:13:51.900 --> 00:14:06.300
Jeremy Sauer: reside or whether affiliation, is whether it's in the quote unquote science labs or whether it's kind of sisal centric but can you comment on that at all, where your respondents seem to be primarily from.
97
00:14:07.410 --> 00:14:15.000
Daniel Howard (CSG, NCAR): So far, i've only presented it to the end hug, and this is a whip sessions which is primarily in car affiliated.
98
00:14:16.170 --> 00:14:22.710
Daniel Howard (CSG, NCAR): Personnel I guess and hug might include others in within the university systems as well, but uh I.
99
00:14:23.250 --> 00:14:38.190
Daniel Howard (CSG, NCAR): wasn't being intentional in terms of designing survey to like dissertation where they were coming from not sure that matters as much, but if anything it, I appreciate this the connecting me with the least looking up the the exhale software engineering kind of.
100
00:14:39.210 --> 00:14:48.780
Daniel Howard (CSG, NCAR): survey that you mentioned, I wasn't aware that these existing i'm not sure even I was asked to pick that because I was only bought on it's like February ish so.
101
00:14:49.860 --> 00:14:56.970
Daniel Howard (CSG, NCAR): Nonetheless, yeah I think it's important to this one trying to emphasize these the gpu needs and hopefully least that's that.
102
00:14:57.750 --> 00:15:12.360
Daniel Howard (CSG, NCAR): separates and just distinguishes itself from maybe any redundant questions are asked in that survey so far to your questions there's only been on 15 responses and I guess now to more from you guys here within this group to this survey.
103
00:15:13.620 --> 00:15:14.820
Daniel Howard (CSG, NCAR): And if anything that plan is.
104
00:15:16.260 --> 00:15:34.050
Daniel Howard (CSG, NCAR): I think we would like to send this out through like the daily Bulletin bulletin, or some other venue by rich to hopefully get a broader sense of needs within the htc users amongst and car, rather than just the ones that happen to show up at the meetings this past week, as well as today.
105
00:15:35.310 --> 00:15:49.620
Daniel Howard (CSG, NCAR): But uh yeah if you've even if now, I guess, if you want to like say like tinker if he wants to just a specific question to add on like the the larger community that when we send that out that might be relevant terms of stay still editing the survey, as it is, as it stands right now.
106
00:15:51.810 --> 00:15:57.960
Jeremy Sauer: yeah I guess i'd say there may be some value in in making sure that you have some.
107
00:15:59.100 --> 00:16:08.790
Jeremy Sauer: Access to the census results because they were more I mean that census from the from X scale type of team was more about surveying.
108
00:16:09.360 --> 00:16:22.860
Jeremy Sauer: What kind of software development application activities are occurring at in car in predominantly the science labs what are people being paid and spending their time working on.
109
00:16:23.400 --> 00:16:32.580
Jeremy Sauer: What are the skill sets that they have in order to work on those things like, for example, do they program and NPI do they program in fortran and C and Python or in.
110
00:16:33.300 --> 00:16:41.400
Jeremy Sauer: More kind of traditional post processing types of paradigms like like matlab or or whatever, so it was more of a.
111
00:16:41.910 --> 00:16:51.420
Jeremy Sauer: let's wrap our heads around what kind of software development activities are going on, across all of the laboratories, that in car and.
112
00:16:52.290 --> 00:17:01.350
Jeremy Sauer: Where Where is there any experience in sort of gpu or accelerator programming so far and I can say that at least.
113
00:17:01.920 --> 00:17:11.970
Jeremy Sauer: The punch line of out of 124 respondent is that there's very, very, very, very little experience in gpus whatsoever ah.
114
00:17:12.750 --> 00:17:17.820
Jeremy Sauer: So anyway they'll just be interesting to see as your survey respondent base grows.
115
00:17:18.600 --> 00:17:29.790
Jeremy Sauer: If that aligns with the type of information that was pulled out of the out of the census out of you know so many respondents across the various laboratories, I think.
116
00:17:30.450 --> 00:17:44.460
Jeremy Sauer: What you may find is that more to the people who are writing software to actually do earth science as opposed to writing software or modifying software, but not actually doing the science itself.
117
00:17:45.150 --> 00:17:59.010
Jeremy Sauer: Is that they don't really play in in in gpus much because they're busy doing the science and right that is a real conundrum right that's that's one of the the tough aspects to get out here, but.
118
00:18:00.090 --> 00:18:03.510
Jeremy Sauer: yeah any information or progress on that front is probably a good thing.
119
00:18:05.160 --> 00:18:09.330
Daniel Howard (CSG, NCAR): Right it's a point to point out, in terms of just like what the limitations are preventing people from getting into the.
120
00:18:09.330 --> 00:18:14.310
Daniel Howard (CSG, NCAR): Space and you know that just maybe spend more time, focusing on the science don't have time to learn gq computing like.
121
00:18:14.760 --> 00:18:23.070
Daniel Howard (CSG, NCAR): Honestly that's already like one of the prevailing some warning prevailing themes within the current limitations and prevention's towards utilizing gpus question in the survey so far.
122
00:18:23.940 --> 00:18:36.930
Daniel Howard (CSG, NCAR): Just in terms of like like a time like of training, lack of clarity on the best choices and gpu approaching so like you know even just spending the time to figure out like if it's worth their time to get into it is often just a barrier of entry to some people, so far.
123
00:18:38.340 --> 00:18:46.680
Daniel Howard (CSG, NCAR): But it's just understanding that that that's the concern I think you know, like dealing with how we're say designing these two training series, we can help you like.
124
00:18:47.040 --> 00:19:02.640
Daniel Howard (CSG, NCAR): make some those steps easier and more accessible to people such as say like you know the bite size approach towards like how we're designing the content, so that you can pick and choose and hopefully be able to go into which specific aspect that need to learn and ideally even.
125
00:19:03.810 --> 00:19:13.230
Daniel Howard (CSG, NCAR): know you know, like like the modern a stake in this like kind of learning of today we you can scroll through a YouTube video to whatever specific spot need to go and we're like Sid was mentioning the other day about.
126
00:19:13.890 --> 00:19:20.430
Daniel Howard (CSG, NCAR): Using like the the live transcript feature of some of the ways in which zoom records say like a people speaking and.
127
00:19:21.060 --> 00:19:31.170
Daniel Howard (CSG, NCAR): presentation, you can like even then click on a particular sentence i'll take you to that portion of the video, and there are ways in which to implement that that we can hopefully implement to do this, training, for example.
128
00:19:34.110 --> 00:19:47.880
Mick Coady: JEREMY the thanks for the input, as always, are the results survey results from the scale tiger team are those available to the public, you know for us.
129
00:19:49.350 --> 00:20:00.300
Jeremy Sauer: A while well I don't think we've shared them outside of the excess scale tiger team and with maybe with a DS but certainly tdd has multiple representatives.
130
00:20:00.750 --> 00:20:13.530
Jeremy Sauer: On the excess scale tiger team, so I would think you know getting access for you guys in sizzle to that information or you know, is not a real tough thing to accomplish.
131
00:20:13.710 --> 00:20:23.970
Mick Coady: Sure yeah I I don't think most of us many of us anyway were included in the survey, you know solicited so.
132
00:20:24.420 --> 00:20:30.180
Jeremy Sauer: Well, in fact, there was an invite from from everett himself.
133
00:20:30.690 --> 00:20:31.800
Jeremy Sauer: followed up by.
134
00:20:32.190 --> 00:20:35.130
Jeremy Sauer: invites are theoretically from a DS.
135
00:20:36.150 --> 00:20:39.990
Jeremy Sauer: and multiple emails cross border like to the whole of in car.
136
00:20:40.110 --> 00:20:48.990
Jeremy Sauer: So okay not have been specific invites but but certainly the entire institution was was asked to participate more than once.
137
00:20:49.080 --> 00:20:56.190
Mick Coady: Well, I I certainly don't doubt that I could have missed that so don't don't take that as.
138
00:20:57.960 --> 00:21:04.140
Mick Coady: A Sir don't take that as a slam where it's more on me than anything so thanks.
139
00:21:04.170 --> 00:21:09.570
John Dennis (he/him): Right, I was just going to follow up here the survey was sent out of September.
140
00:21:10.710 --> 00:21:14.460
John Dennis (he/him): Or it was open in September so that was just you know, like a.
141
00:21:15.900 --> 00:21:19.470
John Dennis (he/him): month and a half ago, or whatever, and there was there was.
142
00:21:21.180 --> 00:21:24.150
John Dennis (he/him): contributions from you know people in sizzle.
143
00:21:25.470 --> 00:21:26.820
John Dennis (he/him): people that are on this call.
144
00:21:28.500 --> 00:21:36.030
John Dennis (he/him): But, but you know, but again, so you know, like JEREMY said it was we tried to advertise it fairly widely.
145
00:21:36.840 --> 00:21:38.850
Mick Coady: i'm sure i'm sure you did it so.
146
00:21:41.730 --> 00:21:48.030
Mick Coady: Another it was just my my misunderstanding or lack of understanding, some awareness.
147
00:21:49.170 --> 00:21:52.680
Sidd: But it was like oh my understanding yeah.
148
00:21:52.950 --> 00:21:55.890
Mick Coady: Okay okay good john you had your hand up.
149
00:21:57.000 --> 00:21:59.460
Mick Coady: Before that was that what you wanted to respond to.
150
00:22:01.530 --> 00:22:14.610
John Dennis (he/him): He yeah I think you know, I was, I was thinking this as well, what JEREMY said was that you know we've collected this data, I think it would be interesting to verify that it's consistent.
151
00:22:16.380 --> 00:22:25.440
John Dennis (he/him): You know, so you know also wondering if we're getting the same the same the same people or if we're getting a different group.
152
00:22:27.180 --> 00:22:27.540
Mick Coady: yeah.
153
00:22:28.680 --> 00:22:33.750
Mick Coady: Well, but I did this time next by time of our December meeting.
154
00:22:35.460 --> 00:22:47.610
Mick Coady: we're not going to I don't think the plan is to close the survey off, you know there's not a deadline on it, but we should certainly be pretty close to 100% of what we're going to get by this time next week and.
155
00:22:48.690 --> 00:22:53.730
Mick Coady: we'll be glad to share that with this group, if not, then, if not before.
156
00:22:54.750 --> 00:22:55.170
So.
157
00:22:56.250 --> 00:23:04.830
Mick Coady: And then, before we move on john Klein, you had your hand up in had sets up earlier and I just want make sure you had a chance to.
158
00:23:06.270 --> 00:23:07.680
Mick Coady: have your have your say.
159
00:23:10.350 --> 00:23:11.730
Daniel Howard (CSG, NCAR): not sure he's on a call anymore.
160
00:23:12.270 --> 00:23:13.890
Mick Coady: Oh okay all right.
161
00:23:16.380 --> 00:23:19.710
Mick Coady: yeah he looks like he has left okay okay good.
162
00:23:21.510 --> 00:23:23.400
Mick Coady: All right, any other comment that.
163
00:23:24.480 --> 00:23:35.340
Mick Coady: Again, we were hoping, and this has been great to get more of a deeper dive conversation than what we were able to get it in hug or the sizzle whip so I.
164
00:23:36.420 --> 00:23:39.600
Mick Coady: Think mission accomplished, at least in part, so.
165
00:23:40.710 --> 00:23:41.640
Mick Coady: Yet, DJ.
166
00:23:45.090 --> 00:23:53.010
David John Gagne: yeah I mean, from my perspective on on the gpu is easy regards to machine learning a lot.
167
00:23:54.900 --> 00:23:58.200
David John Gagne: For pretty much all the user base i'm familiar with it is operating at.
168
00:23:59.220 --> 00:24:11.130
David John Gagne: a much higher level than doing anything with kuda directly with the exception of kind of like the work say Jeremy and his group are doing a fast eddie and the machine learning in there.
169
00:24:12.540 --> 00:24:19.560
David John Gagne: But mostly for like your offer they're using a something like tensorflow or pie torture other higher level library, so the.
170
00:24:20.760 --> 00:24:24.630
David John Gagne: I think that the training they like they kind of need him or.
171
00:24:26.100 --> 00:24:34.170
David John Gagne: Or, I see more more use need is more like, how do you properly use these high level libraries and get around performance bottlenecks and.
172
00:24:35.640 --> 00:24:43.230
David John Gagne: How do you move your data like stager data move move it like over to the gpu and MAC and efficient ways.
173
00:24:44.340 --> 00:24:50.130
David John Gagne: release or what library, should you be using and how should you be calling them because that there's a lot of gotchas in that in that process.
174
00:24:51.240 --> 00:24:51.900
You bet.
175
00:24:55.740 --> 00:24:57.030
Mick Coady: Thanks thanks David.
176
00:24:58.950 --> 00:25:00.420
Mick Coady: Anything else DJ.
177
00:25:02.910 --> 00:25:06.480
David John Gagne: um But my main feedback on there.
178
00:25:06.930 --> 00:25:08.310
Mick Coady: Okay, all right thanks.
179
00:25:08.940 --> 00:25:10.980
David John Gagne: yeah yeah I think I think we do have like a.
180
00:25:11.310 --> 00:25:15.960
David John Gagne: we're going to be giving a talk next month on here to go over some of that in more detail.
181
00:25:16.170 --> 00:25:19.140
Mick Coady: yeah I know you've had some conversations or changes with.
182
00:25:20.280 --> 00:25:25.020
Mick Coady: or phone and I on that so looking forward to for sure thanks I.
183
00:25:26.610 --> 00:25:27.060
Mick Coady: said.
184
00:25:28.800 --> 00:25:31.110
Sidd: I was just going to suggest DJ.
185
00:25:32.850 --> 00:25:41.970
Sidd: Obviously I do see where you are coming from and I do agree with you in all respect, but at the same time it's not.
186
00:25:43.710 --> 00:25:48.150
Sidd: easy for us to come up with the performance recommendation with.
187
00:25:48.630 --> 00:26:02.880
Sidd: This very, very high level library usage unless we ourselves are conversant with the usage of those libraries and unfortunately go, we put together this libraries, but we are not the users of these libraries.
188
00:26:03.330 --> 00:26:19.710
Sidd: So is there any way we can collaborate and come up with the course material lori when when some of that this thing, and some of the training you provide to us to train to more broader community is that a possibility.
189
00:26:21.960 --> 00:26:33.870
David John Gagne: um I think so, and we're going to think, starting with the presentation next month's try to get that going in a more but process going to more sustainable manner.
190
00:26:36.510 --> 00:26:47.970
David John Gagne: I think most of the things we've been doing so far been more St Charles and the more in the basics, rather than the scaling side but it's certainly something we can we can work on and.
191
00:26:49.380 --> 00:26:57.870
David John Gagne: kind of identify where the like major pain points and needs are and try to provide more guidance on that yeah sure.
192
00:26:58.170 --> 00:27:06.030
Sidd: And also, right now, if I understand right the scaling is a different aspect of the problem even.
193
00:27:07.170 --> 00:27:15.720
Sidd: More from the usage of the single gpu side there, I believe we should be able to contribute a lot working with you guys.
194
00:27:18.300 --> 00:27:18.630
David John Gagne: yeah.
195
00:27:19.650 --> 00:27:28.530
David John Gagne: There aren't many people in our group is mainly most of the stuff we do a single gpu or ever using multiple gpus it's in a.
196
00:27:29.130 --> 00:27:39.540
David John Gagne: distributed sense, not in a like just a bunch of sequence single gpu jobs rather than doing a lot of like multi gpu stuff a lot, a lot of that is driven currently just by.
197
00:27:41.490 --> 00:27:44.730
David John Gagne: Key wait times being when the limiting factors there.
198
00:27:46.620 --> 00:27:48.360
David John Gagne: or where we can get our like.
199
00:27:50.850 --> 00:28:01.620
David John Gagne: This time it takes to train the model it like it's a lot, it makes a lot more sense to have it be you know, a time slower but go through the Q like 10 times faster.
200
00:28:02.910 --> 00:28:03.810
David John Gagne: So so that's.
201
00:28:04.950 --> 00:28:10.320
David John Gagne: I think the least the logic with working with casper let's make a spectacle change somewhat went to go to.
202
00:28:12.480 --> 00:28:13.440
David John Gagne: The dorito.
203
00:28:15.150 --> 00:28:23.940
David John Gagne: And then we'll have some more opportunity to really test them will scaling out stuff in the the equation may may change in terms of the benefit there.
204
00:28:25.140 --> 00:28:26.250
Sidd: Is yeah.
205
00:28:31.230 --> 00:28:33.570
Mick Coady: Thanks both of you appreciate any other.
206
00:28:34.740 --> 00:28:37.230
Mick Coady: input feedback at this time.
207
00:28:39.600 --> 00:28:47.220
Mick Coady: Again, please feel free to reach out to the whole team or Daniel myself said.
208
00:28:48.390 --> 00:28:51.180
Mick Coady: individually, if you have any think of anything.
209
00:28:52.680 --> 00:28:54.090
Mick Coady: After the meeting so.
210
00:28:56.820 --> 00:28:59.190
Mick Coady: DJ Do you still have your hand up or is that new.
211
00:29:03.450 --> 00:29:04.980
David John Gagne: That was old okay.
212
00:29:05.130 --> 00:29:07.470
Mick Coady: All right, no, no problem thanks yeah JEREMY.
213
00:29:08.460 --> 00:29:10.500
Jeremy Sauer: yeah, I guess, I just want to do.
214
00:29:11.520 --> 00:29:19.710
Jeremy Sauer: In the context of observing on X scale tiger team and not to you know, to try to avoid redundancy, because we certainly don't need that.
215
00:29:20.640 --> 00:29:38.130
Jeremy Sauer: But, but more importantly, promotes energy ah, is the essence of the survey y'all are putting out there, right now, and I just took it, so I mean, I guess, I have a sense but, but it is it really all about.
216
00:29:39.690 --> 00:29:50.340
Jeremy Sauer: All you scientists out there, tell us how to best train you to begin using gpus is that really the punch line motivation of it.
217
00:29:51.630 --> 00:29:53.100
Jeremy Sauer: From your perspectives.
218
00:29:55.350 --> 00:29:56.670
Daniel Howard (CSG, NCAR): Because during I put it together i'll say.
219
00:29:57.840 --> 00:30:05.670
Daniel Howard (CSG, NCAR): it's kind of both between like the software engineers within those teams, you know as part of the scientists and the scientists themselves because, like there's competing interests there for each group but.
220
00:30:06.180 --> 00:30:15.690
Daniel Howard (CSG, NCAR): Nonetheless, just in terms of developing a training series, which is what we're planning to do in any case there's just trying to understand our audience as best as possible.
221
00:30:16.920 --> 00:30:31.530
Daniel Howard (CSG, NCAR): And to whatever extent like you know each user might know what best serves them and their needs is understanding what people coming from I think it's maybe a better way of maybe phrase it but it's one way at least understanding what we're trying to get at.
222
00:30:33.240 --> 00:30:35.670
Jeremy Sauer: Okay okay yeah I mean, I think.
223
00:30:36.690 --> 00:30:45.240
Jeremy Sauer: 111 kind of misconception that I think is out there a lot and it's just something to keep in mind is the notion that the software engineers are are.
224
00:30:45.990 --> 00:30:57.750
Jeremy Sauer: are developing the code me anywhere near exclusively and and I would say even even the notion that software engineers are developing the codes in the majority of cases.
225
00:30:58.200 --> 00:31:05.550
Jeremy Sauer: it's simply not true, the scientists at in-car have historically developed the codes.
226
00:31:06.150 --> 00:31:18.480
Jeremy Sauer: And even when we're talking about small codes, you know, maybe not gargantuan CSM or war or impasse or some of the major code bases, but even just small software development activities.
227
00:31:19.200 --> 00:31:29.850
Jeremy Sauer: These are quite often not software engineers, these are scientists who have a need, they need to write some software to to to meet that need, and so they do.
228
00:31:30.420 --> 00:31:43.500
Jeremy Sauer: And I think that's something that we've struggled to come to grips with on the X scale tiger team and it might be something that may help you in the context of, as you said, understanding, your audience.
229
00:31:44.670 --> 00:31:50.700
Jeremy Sauer: I think I don't think many of the science labs have armies of software engineers to take on.
230
00:31:52.200 --> 00:32:11.100
Jeremy Sauer: optimizing or or gpu accelerating or whatever various code basis and the end, I think the people who, who are doing, most of the software development are are in fact scientists so just just my perspective on that one as a final two cents.
231
00:32:12.120 --> 00:32:18.630
Daniel Howard (CSG, NCAR): I appreciate that and, if anything, it's a it's speaks to the current cultural landscape within the sciences, I think I even remember one of the.
232
00:32:19.140 --> 00:32:29.520
Daniel Howard (CSG, NCAR): feedbacks already in terms of just like one of those asking about like how to like do like gpu grant proposals and such and getting accesses access, not just the heart of users, but just like you know.
233
00:32:30.870 --> 00:32:36.900
Daniel Howard (CSG, NCAR): Getting say other experts or even people time in terms of just like being able to have like grants, people will fund say.
234
00:32:38.130 --> 00:32:44.730
Daniel Howard (CSG, NCAR): suffer engineers or others to be able to speak those projects for the more say in depth optimization parts of like a problem where.
235
00:32:45.120 --> 00:32:49.020
Daniel Howard (CSG, NCAR): You know, as you say you know most of scientists that they're actually doing, most of the work.
236
00:32:49.410 --> 00:32:54.270
Daniel Howard (CSG, NCAR): it's mainly just the nitty gritty stuff that maybe they don't actually want to spend the time on and to what extent we want to.
237
00:32:54.750 --> 00:33:08.400
Daniel Howard (CSG, NCAR): How deep you want to go into that to the training series I think that's what we're trying to entertain but uh you know I appreciate the the emphasis, and we have a test and clarification that's that is mostly scientists that are developing these code bases.
238
00:33:12.420 --> 00:33:26.310
Mick Coady: One of the things we've been I wouldn't say struggling with big question in our mind JEREMY for everyone is who who will be the who's our target target audience is the science scientists.
239
00:33:28.200 --> 00:33:38.160
Mick Coady: Scientific user Community or is it software engineers, or some combination of both, which is where we expected to to land some.
240
00:33:39.540 --> 00:33:41.730
Mick Coady: Okay, thanks guys.
241
00:33:43.050 --> 00:33:49.980
Mick Coady: Everyone, this was useful and certainly much more lively than what we were able to.
242
00:33:51.000 --> 00:33:54.780
Mick Coady: tease out of the end hug or the whip talk from yesterday so.
243
00:33:55.830 --> 00:34:00.240
Mick Coady: I think, in the interest of time we'll move on here then i'm going to.
244
00:34:01.650 --> 00:34:10.740
Mick Coady: Just quickly share screen to let you know that the up next is Sid who's going to talk to.
245
00:34:11.850 --> 00:34:18.660
Mick Coady: us about the the need to get some code for invidious P cast training so take it away.
246
00:34:19.410 --> 00:34:30.990
Sidd: Okay, so we contacted nvidia about this P cast training program and, in turn, they asked us if we have if we can provide.
247
00:34:33.450 --> 00:34:49.410
Sidd: code which is manageable, but yet, not a toy code like jacoby your matrix multiplication of things like that, so we were just wondering wins and the Internet there, we can get thousands of code like that, but.
248
00:34:50.790 --> 00:35:05.160
Sidd: Roughly our requirement is something like a few thousand lines of code, preferably in fortran preferably be of interest to general user community of university and car.
249
00:35:05.760 --> 00:35:28.950
Sidd: So do you guys have any suggestions for us, or do you already have such code that you can share that we can use freely for our training purposes and share with us or share using our own github repository so with that, let me open up and ask questions Elton john go ahead.
250
00:35:30.090 --> 00:35:43.710
John Dennis (he/him): Well, I was, I was wondering if the the the micro the morrison gettleman micro physics, would be a good example, we could always I don't think it has any bugs in here there at the moment, but we can always inject some.
251
00:35:47.640 --> 00:35:53.250
Sidd: So, can you tell us a little more detail means he is it a standalone code or a.
252
00:35:53.880 --> 00:36:16.350
John Dennis (he/him): Well it's it's a so earlier versions of it was what we're using the procurement have some direction so so it's kind of a it's kind of a modestly size code it's easy for vendors to run and so and the the the version, you know the version we have now has a bunch of.
253
00:36:17.670 --> 00:36:19.890
John Dennis (he/him): Open ACC directives in it so.
254
00:36:21.510 --> 00:36:29.100
Sidd: Okay, of course, we can play around with it, and so it has the input set of data which we can remove what they.
255
00:36:31.170 --> 00:36:41.940
Sidd: mean the both of those directors, so we can remove and give it to others for inserting those in the right places, and things like that, and how big is the code.
256
00:36:45.840 --> 00:36:47.370
John Dennis (he/him): Jen are you are you on.
257
00:36:48.750 --> 00:37:00.810
John Dennis (he/him): I I think it's 1030 I think I want to say it's like 6000 lines or 10,000 lines or something like that Okay, you know Jen fixed a bug like three days ago, and he could just read insert it.
258
00:37:06.990 --> 00:37:12.480
John Dennis (he/him): So it's it's not it's not it's it's perhaps a little bit larger than you want.
259
00:37:13.710 --> 00:37:31.050
Sidd: On the largest side, but certainly we would like to consider it so, can you point us towards the repository or whatever, and also the licensing party is it Okay, for us to use it freely to share.
260
00:37:33.030 --> 00:37:41.880
John Dennis (he/him): yeah yeah no i'll talk to Jen I just I didn't realize he wasn't on so i'll talk to 10 in he's it's it's just I mean it's.
261
00:37:42.930 --> 00:37:49.920
John Dennis (he/him): we've distributed to to cray and we've distributed video, so I mean it's uh yeah.
262
00:37:51.420 --> 00:37:51.780
Sidd: Okay.
263
00:37:53.460 --> 00:38:00.210
Sidd: Any audition when it just we were looking for few men say if we can get.
264
00:38:03.180 --> 00:38:05.760
Mick Coady: yeah having more than one would certainly be.
265
00:38:05.820 --> 00:38:06.150
Sidd: here.
266
00:38:06.330 --> 00:38:07.650
Mick Coady: We wouldn't turn it down.
267
00:38:07.860 --> 00:38:13.350
John Dennis (he/him): search so supreme supreme had a good idea to swim code that's an easier code.
268
00:38:15.900 --> 00:38:18.390
John Dennis (he/him): So it's a c++ code.
269
00:38:18.600 --> 00:38:18.990
yeah.
270
00:38:20.670 --> 00:38:35.970
Supreeth Madapur Suresh: yeah you mentioned photons on the side of us, but yeah if you need a c++ code, we have swim core, which is fairly simple and we just 600 lines and we even have the cocoa sport and open SEC open MP everything.
271
00:38:37.830 --> 00:38:52.410
Sidd: Okay, so that is also interesting and if it is 600 lines, probably, we can convert it into fortran if we like so a little more detail is it finance volume spectral and swim.
272
00:38:53.220 --> 00:38:54.450
Supreeth Madapur Suresh: shallow water models.
273
00:38:54.750 --> 00:39:10.740
Sidd: yeah I the name sounds familiar, because it was once part of the spec benchmark, so I assume that's the similar one, but the we hit specter right is it a globally spectrum or gaining.
274
00:39:11.850 --> 00:39:14.340
Supreeth Madapur Suresh: Oh i'm not quite sure okay.
275
00:39:14.430 --> 00:39:16.830
John Dennis (he/him): Well it's not a it's definitely a difference.
276
00:39:17.280 --> 00:39:34.740
Sidd: Okay, fine glee great, and that should be great Okay, please pass that on if and also important part is, we want to be able to share freely distribute freely and without any restriction on that testing so.
277
00:39:35.070 --> 00:39:39.570
Supreeth Madapur Suresh: yeah I don't think there's any license associated with it, as far as i'm aware.
278
00:39:40.080 --> 00:39:42.390
Supreeth Madapur Suresh: So, and we have been sharing that with.
279
00:39:42.420 --> 00:39:45.660
Supreeth Madapur Suresh: External vendors for quite a long time, so I think that.
280
00:39:46.680 --> 00:39:50.010
Sidd: And all these different ports are in different branches I.
281
00:39:50.010 --> 00:39:55.860
Sidd: guess know that, too, we can get a pristine copy of without any of these boards and.
282
00:39:55.920 --> 00:39:57.150
Supreeth Madapur Suresh: Yes, that's correct okay.
283
00:39:59.730 --> 00:40:03.450
Supreeth Madapur Suresh: Internal deep into one API port as well, so.
284
00:40:04.080 --> 00:40:11.190
Sidd: Great that sounds like a layer like the core, we should be using and now.
285
00:40:12.210 --> 00:40:13.470
Sidd: quick question is it.
286
00:40:15.420 --> 00:40:18.150
Sidd: On a sphere, or on a rectangle or what.
287
00:40:19.380 --> 00:40:21.150
Supreeth Madapur Suresh: It is on a spear okay.
288
00:40:21.420 --> 00:40:26.880
Sidd: So we can do some kind of visualization to okay i'm excited about it please pass it on.
289
00:40:27.960 --> 00:40:28.470
Supreeth Madapur Suresh: Oh yeah.
290
00:40:29.850 --> 00:40:35.520
Supreeth Madapur Suresh: any of you, if you want access to the code like if you can give me your github ID I can add you to the repository.
291
00:40:36.240 --> 00:40:45.510
Sidd: Okay, so what I will do, I will send my default ID to get our ID to you an email address and instead of flashing here.
292
00:40:46.740 --> 00:40:48.660
Supreeth Madapur Suresh: yeah sure that box, thank you.
293
00:40:49.590 --> 00:40:49.980
Sabrina.
294
00:40:51.120 --> 00:40:57.120
Supreeth Madapur Suresh: He went to plan, if you want to access to the codec please give me a bit of it i'd like you to repository.
295
00:40:59.910 --> 00:41:02.040
Mick Coady: Was that responding to Brian dobbins.
296
00:41:02.160 --> 00:41:03.810
Supreeth Madapur Suresh: Yes, okay okay good.
297
00:41:04.230 --> 00:41:05.490
Mick Coady: just want to make sure you solve it.
298
00:41:06.630 --> 00:41:09.210
Mick Coady: Good all right said, I think that we.
299
00:41:11.460 --> 00:41:14.070
Sidd: can say that relation to what I was looking at was.
300
00:41:14.460 --> 00:41:15.450
Mick Coady: It was a success.
301
00:41:15.510 --> 00:41:16.020
yeah yeah.
302
00:41:17.190 --> 00:41:25.920
Mick Coady: Thanks john Dennis in supreme that was terrific Okay, then, just to close up here, I want to.
303
00:41:28.290 --> 00:41:29.280
Mick Coady: quickly show.
304
00:41:30.570 --> 00:41:31.710
Mick Coady: If you saw on the.
305
00:41:33.810 --> 00:41:38.340
Mick Coady: Agenda I didn't want to just point out to what I found.
306
00:41:39.600 --> 00:41:52.410
Mick Coady: A small bug in the casper V 100 resource table earlier this week I know some I know it's not widely popular, but I know some of you do use that and.
307
00:41:53.550 --> 00:42:03.360
Mick Coady: here's a snapshot that I hope you can see right now the of what the node status and table look like earlier today.
308
00:42:04.710 --> 00:42:12.180
Mick Coady: You can see that there were no jobs running in the gpu Dev Q all right, otherwise it would have shown up here.
309
00:42:13.500 --> 00:42:21.600
Mick Coady: If we scroll down and look at the next table on that page, which is the V 100 node status.
310
00:42:23.010 --> 00:42:31.410
Mick Coady: Knowing that casper 25 is normally assigned to that cue you can see that.
311
00:42:32.610 --> 00:42:38.820
Mick Coady: In terms of the number of busy gpus available gpus it's it's wrong right.
312
00:42:40.020 --> 00:42:40.830
Mick Coady: This is.
313
00:42:42.780 --> 00:42:47.070
Mick Coady: i'm not going to blame PBS on this, but it's clearly.
314
00:42:49.740 --> 00:42:57.210
Mick Coady: Either a misunderstanding on my part or a bug and PBS that shows that, I think, because the node is reserved.
315
00:42:58.320 --> 00:43:01.740
Mick Coady: That it assumes that all the components.
316
00:43:02.790 --> 00:43:19.980
Mick Coady: Both the cpus and gpus are busy so i'm working have already started working on a workaround for this, but for those of you who were relying on the accuracy of this table kind of figure out why your jobs aren't running or how what you might be able to expect to get through the queue.
317
00:43:21.330 --> 00:43:22.830
Mick Coady: In the reasonable amount of time.
318
00:43:24.210 --> 00:43:30.600
Mick Coady: This needs a little more work so just wanted to alert you to that quickly and.
319
00:43:32.790 --> 00:43:38.880
Mick Coady: Hopefully i'll have that done here within the next couple of days, the another thing another feature that will be coming up on here is.
320
00:43:40.020 --> 00:43:43.740
Mick Coady: For whatever note is assigned to that.
321
00:43:44.820 --> 00:44:04.440
Mick Coady: is reserved whether it's for the gpu Dev or Q or not i'm going to add another option here into the state column of, say, probably some abbreviation of the word reserved, so we you'll be able to see that so.
322
00:44:06.210 --> 00:44:06.750
Mick Coady: In.
323
00:44:08.010 --> 00:44:12.990
Mick Coady: So with that we're almost we're getting close to top of the hour.
324
00:44:14.130 --> 00:44:33.840
Mick Coady: wanted to remind everybody about the gtd wiki homepage here at this URL you can see there, and of course i'm always interested in your ideas for agenda items for meetings coming up.
325
00:44:36.060 --> 00:44:39.210
Mick Coady: As well as this is something JEREMY had brought up.
326
00:44:40.290 --> 00:44:42.870
Mick Coady: meet last meeting or meeting before last that.
327
00:44:44.190 --> 00:44:49.050
Mick Coady: The the attendance at these meetings is pretty heavily sizzle focused.
328
00:44:50.430 --> 00:45:01.110
Mick Coady: Not that they're all consulting services or even numbers of HP CD we have quite a few Members from tdd like john Dennis and and others so.
329
00:45:02.700 --> 00:45:04.320
Mick Coady: What we're always looking for.
330
00:45:04.530 --> 00:45:11.490
Mick Coady: i'm very open adding new members and inviting new folks along here so.
331
00:45:13.290 --> 00:45:14.100
Mick Coady: We will.
332
00:45:15.690 --> 00:45:24.240
Mick Coady: Please give me your feedback, either directly or into the group okay boom and john.
333
00:45:25.620 --> 00:45:28.230
Mick Coady: john Dennis the answer to your question is yes.
334
00:45:29.190 --> 00:45:29.610
John Dennis (he/him): Thank you.
335
00:45:30.000 --> 00:45:31.230
Mick Coady: You bet you bet.
336
00:45:32.610 --> 00:45:44.370
Mick Coady: With that are there does anybody have anything else, they want to bring up the quickly and or else i'll let you go and give you two minutes back in your afternoon.
337
00:45:48.420 --> 00:45:50.400
Mick Coady: doesn't sound like it yeah.
338
00:45:50.430 --> 00:45:52.080
Jeremy Sauer: I this is JEREMY.
339
00:45:52.110 --> 00:45:52.770
Mick Coady: And yes, your.
340
00:45:53.010 --> 00:45:58.680
Jeremy Sauer: thoughts on something I might I personally would be interested to understand.
341
00:45:59.820 --> 00:46:13.170
Jeremy Sauer: A little more that i've never had the time to look into but have been affected by is UCLA uc X in open mpi what is it what's its point, why is it so devastatingly.
342
00:46:14.820 --> 00:46:27.990
Jeremy Sauer: cumbersome buggy causing major problems, what kind of things can be done to address that etc, etc it's something that I, as a scientist slash computational scientists.
343
00:46:28.500 --> 00:46:44.280
Jeremy Sauer: don't really have the time to dig into the details or go waiting through Google and reading whatever blog articles are out there and whatnot so it'd be a huge benefit to me if you know some of the folks in sisal who's you know who may be more closely.
344
00:46:45.570 --> 00:46:57.990
Jeremy Sauer: aligned with that kind of a topic and understanding, could help inform me about and and maybe there are others as well, but it was just one thing that I thought of as a potential topic for a future meeting.
345
00:46:58.500 --> 00:47:16.290
Mick Coady: Thank you that's that's a it's a great suggestion I think within consulting group Brian bandwidth is probably most familiar with it, but I wouldn't said i'm sure rory probably have they certainly no more than I do, and within the.
346
00:47:18.150 --> 00:47:40.080
Mick Coady: hs G with the system engineers i'm going to guess that Ben Matthews, is probably pretty up to speed on that too so that's just a way of saying that i'm sure we can find some folks to make a presentation, if not next month, but that at january's.
347
00:47:42.840 --> 00:47:43.530
Mick Coady: yeah supreme.
348
00:47:44.430 --> 00:47:46.830
Supreeth Madapur Suresh: Oh yeah, I just wanted to respond to JEREMY.
349
00:47:48.330 --> 00:47:57.540
Supreeth Madapur Suresh: If you can start the discussion, but like, if you want a working copy of your cx and some of the option that you can use to optimize your code.
350
00:47:58.200 --> 00:48:11.490
Supreeth Madapur Suresh: Then we can provide you with a sample script, on the other hand, we are trying to include all of this ucs option property sex option inside a container so that users don't have to worry about it.
351
00:48:15.720 --> 00:48:16.440
Mick Coady: that'd be great.
352
00:48:16.680 --> 00:48:24.000
Jeremy Sauer: I mean, I guess, from my perspective, as an application and scientific applications programmer I really don't want to know anything about ucs.
353
00:48:24.870 --> 00:48:29.160
Jeremy Sauer: i've used mpi for almost two decades and not have to worry too much.
354
00:48:29.190 --> 00:48:38.790
Jeremy Sauer: About underlying details, but like ucs came along and sort of standard mpi doesn't work in some cases like simple things like.
355
00:48:39.270 --> 00:48:49.860
Jeremy Sauer: asynchronous send receives and certain collective reductions and things like that and and that's the thing about ucs that i'm I don't know either I need to learn more.
356
00:48:50.340 --> 00:49:06.390
Jeremy Sauer: or i'd like to be more insulated from it, ideally i'd I feel it's low level enough that that really the applications programmers should not should not have to think about that, as a first order consideration, but that's again just my opinion, maybe I need to learn more at this point.
357
00:49:07.830 --> 00:49:19.470
Supreeth Madapur Suresh: I totally agree it's it's, not just for us, even for some of the folks in in media, especially with the gpu data, the options with uc access at target has been problematic.
358
00:49:19.950 --> 00:49:34.560
Supreeth Madapur Suresh: I would really appreciate if the ucs developers enable those some of those for like pro just provide one option gpu data and enable all the options that has an essay by default so yeah I totally agree with you.
359
00:49:35.700 --> 00:49:52.770
Sidd: So question to you guys as a brief and gentlemen, I really appreciate your feedback and I wish Brian would have been in this call unfortunately he's on pto today, but I mean brand or vendor window.
360
00:49:53.910 --> 00:49:54.360
Sidd: But.
361
00:49:56.130 --> 00:50:17.130
Sidd: into the D it's like setting the right set of environment variable to get the variables to get it working correctly as expected right so we'll do that, I recommend that we in the consulting side set up our default set of modules for ucs variable with open mpi.
362
00:50:18.570 --> 00:50:20.790
Sidd: So that you guys do not even have to think about.
363
00:50:23.970 --> 00:50:36.600
Mick Coady: That i'd suggest that we asked people to think about that, between now and the next meeting and make that the open that up for conversation after whoever makes the presentation next month.
364
00:50:38.550 --> 00:50:38.910
Sidd: Okay.
365
00:50:39.660 --> 00:50:48.510
Mick Coady: yeah I just did in the interest of everybody's time i'm sure we've already got some folks dropping off i'm sure they've got other things they need to run to so.
366
00:50:50.520 --> 00:50:55.800
Mick Coady: But i'm i'm willing to stick stick on stay on and listen to feedback on that.
367
00:50:58.530 --> 00:51:04.980
Sidd: So I left, can you provide your feedback now for even for us to think about it.
368
00:51:06.360 --> 00:51:07.980
Supreeth Madapur Suresh: It really depends on.
369
00:51:10.530 --> 00:51:11.910
Supreeth Madapur Suresh: Like what the code is right.
370
00:51:13.320 --> 00:51:20.460
Supreeth Madapur Suresh: For example, some of the option might be better for us where we are utilizing the whole north, not just the gpus.
371
00:51:21.570 --> 00:51:25.800
Supreeth Madapur Suresh: But at least we configured to a set of options, but.
372
00:51:27.840 --> 00:51:42.660
Supreeth Madapur Suresh: If we are just utilizing the gpus then there might, there is another set of there might be another set of options that we want to configure if we can override the default settings and I think that that could be okay.
373
00:51:44.250 --> 00:51:57.750
Sidd: yeah we can then we can group together a bunch of such cases, for example, you are using mostly gpus gpu resident code like jeremy's or you are using host.
374
00:51:58.380 --> 00:52:20.850
Sidd: Plus gpus or must be host goldman's at is three different kind of options are three different groups of options we can grouped into and end of the day, if that is not adequate, this is not prescriptive diet mincing you can override always be your own setting if you don't like ours.
375
00:52:21.960 --> 00:52:31.650
Sidd: But at least what the kind of problem JEREMY was saying that a synchronous read right doesn't work doesn't work sounds like really problematic.
376
00:52:34.470 --> 00:52:36.780
Jeremy Sauer: yeah said, from my perspective, I mean.
377
00:52:38.100 --> 00:52:54.360
Jeremy Sauer: Well we've been seeing him for some number of months, of course, fast eddie is is resident gpu gpu resident right so we're primarily on the gpu our communications us kudo where mpi so we're going to you direct when we're on node and then.
378
00:52:56.340 --> 00:53:02.040
Jeremy Sauer: I forget what the other way is called when you gotta jump a node but it's still gpu gpu ultimately.
379
00:53:02.460 --> 00:53:16.080
Jeremy Sauer: yeah either way either way the killer was when all of a sudden, no more standard builds of anything less than open API for and those all open mpi for and and on us ucs.
380
00:53:16.620 --> 00:53:26.850
Jeremy Sauer: We would finance nan's out of nowhere, right after years of running you know kind of standard stuff not changing any of our mpi communications or anything like that, as soon as.
381
00:53:27.150 --> 00:53:32.130
Jeremy Sauer: mpi Ford comes around and there's ucs on the back end of it nance.
382
00:53:32.220 --> 00:53:38.010
Jeremy Sauer: Like just out of nowhere not really reproducible couldn't even hone it down to kind of say.
383
00:53:39.180 --> 00:53:43.560
Jeremy Sauer: A particular timestamp or even a particular particular.
384
00:53:44.820 --> 00:53:52.440
Jeremy Sauer: set of HALO exchanges, it just and and and of course I spoke with Bryan about a number of times, and you know the.
385
00:53:53.670 --> 00:53:59.760
Jeremy Sauer: The the request that usually comes back to me as well, can you give me a minimum re producer.
386
00:54:00.510 --> 00:54:08.370
Jeremy Sauer: And what i'm saying, is what I observed with us the X is not really amenable to me spending my time producing a minimum lead producer.
387
00:54:08.700 --> 00:54:25.620
Jeremy Sauer: So it's these kind of things that are just absolute showstoppers we're doing great we actually have a code, we took the time to accelerate with gpus and use the X broke it and I don't even know I don't know how to diagnose it I don't know how to.
388
00:54:26.790 --> 00:54:29.100
Jeremy Sauer: Make it noticeable to other people.
389
00:54:30.270 --> 00:54:36.420
Jeremy Sauer: And that's sort of the conundrum so I guess yeah my question is well.
390
00:54:37.620 --> 00:54:47.130
Jeremy Sauer: How do we solve these kinds of issues because they're they're pretty frightening actually right, I mean it'd be like yeah like all of a sudden mpi just doesn't work the way it's supposed to.
391
00:54:47.610 --> 00:54:57.840
Jeremy Sauer: And and it's at some low level so far beyond my expertise i'm just not even sure which which direction to head towards a solution that's where i'm struggling.
392
00:54:58.380 --> 00:55:08.850
Sidd: Okay, I do hear you now, the question is, will have to reproduce that in order to even generate an optimal set for you.
393
00:55:10.350 --> 00:55:16.920
Sidd: But right now, are you okay means do have already found something that works for you or you are really stuck.
394
00:55:17.130 --> 00:55:17.760
Yes.
395
00:55:19.260 --> 00:55:26.460
Jeremy Sauer: Ryan builds us an open and bi that doesn't use the sex that works, every time without fail okay.
396
00:55:26.820 --> 00:55:31.440
Sidd: Okay, I think I need to take this conversation with Brian then in order to learn more.
397
00:55:32.520 --> 00:55:44.310
Sidd: And then, perhaps next week or we can discuss more intelligently means instead of you teaching me all the history, obviously brand is already ahead of being this.
398
00:55:45.300 --> 00:55:56.340
Jeremy Sauer: yeah that's about the gist of it, I think what I what i've conveyed and Ryan, Brian may know more so absolutely i'm sure conversations with him will be helpful as well and, and please do.
399
00:55:57.270 --> 00:56:04.110
Jeremy Sauer: engage me as as as you think it would be beneficial and i'll i'll do the best to provide whatever information I can yeah.
400
00:56:04.590 --> 00:56:18.840
Sidd: Exactly yeah like even if you cannot provide a minimal set of Code as a fee, for example, in the last minute, you told us the essence goodness Hello exchange but.
401
00:56:19.440 --> 00:56:39.180
Sidd: The synchronous Hello exchange of the kind of size, if you can tell us that at least we can try to come up with a toy code that just just that and try to see if we can reproduce it and fix it using another set of API on another set of environment variables with sex.
402
00:56:40.230 --> 00:56:44.430
Sidd: But anyway, you get what I am driving towards yeah.
403
00:56:44.970 --> 00:56:46.290
Jeremy Sauer: Absolutely, thank you said.
404
00:56:47.340 --> 00:56:47.910
Sidd: Thank you yeah.
405
00:56:49.350 --> 00:56:51.420
Mick Coady: thank both of you okay.
406
00:56:54.120 --> 00:56:54.990
Mick Coady: Anything else.
407
00:56:58.650 --> 00:57:03.900
Mick Coady: All right, we'll do that thanks everyone and.
408
00:57:05.100 --> 00:57:23.250
Mick Coady: I think we've got a pretty good lineup already for next month's meeting, which is exciting because I usually am scrambling to find to think of some relevant content that you guys have been very helpful today so have a good rest of your day and we'll talk to y'all soon i'm sure.
409
00:57:25.470 --> 00:57:25.860
Sidd: Thank you.
410
00:57:26.880 --> 00:57:27.150
Mick Coady: bye.
411
00:57:27.960 --> 00:57:29.700
Davide Del Vento: bye thanks bye bye.