Beste art um Kombinationen zu programmieren?

ByRanjahh April 8, 2023February 17, 2025

Moin!

Was ist die effizienteste Methode, wenn man mit Kombinationen programmieren will? Und welche Programmiersprache?

Ist eine Datenbank das schnellste?

Ich kenne mich leider kaum aus…

Ich kann es ja kurz anreissen, was ich machen will.

Es sollen zb 100 textdatein geladen werden. In diesen sind circa à 300.000 zeilen mit zahlen. Insgesamt gibt es aber nur 500.000 verschiedene zahlen. Nun möchte ich das immer 15 Datein zusammengefügt werden.

Wenn zb 6 Datein die volle Zahl von 500.000 erreichen, wird dort nicht weiter kombiniert.

Ich möchte so heraus finden, welche 15 Datein unter 500.000 bleiben.

Ich hoffe es ist verständlich in abgespeckter form😁

Lg!

(1 votes)

Programming & Software development

KI welcher der Beste?

ByLold123sweqrt September 29, 2024February 15, 2025

Welche KI findet ihr am besten? Momentan benutze ich den von Microsoft Edge

Mathematics

Hilfe?

BySiabbbbb September 22, 2024February 26, 2025

Ein Päckchen ist 30cm lang und 40cm breit. Es wird mit 3,30m Band verschnürt, wovon 10cm knoten und Schlaufe verbraucht werden. Wie hoch ist das Päckchen

Programming & Software development

Helfe im meinen Code kommt eine Eorror meldung?

ByLennyPegau2 June 26, 2024February 15, 2025

discord.errors.ApplicationCommandInvokeError: Application Command raised an exception: TypeError: Embed.add_field() missing 1 required keyword-only argument: ‘value’so eine Error meldung kommt hier noch ein Bild von Code

Programming & Software development

Wie kann man Webseiten wie zb von 2022 vs heute ansehen ohne Wayback Machine?

ByBananacatmau August 6, 2024February 15, 2025

Weil mit Wayback Machine muss die Webseite gespeichert sein, sonst geht das nicht. Geht das irgendwie auch anders?

Programming & Software development

Was sagt ihr zu meiner ersten Website?

ByTrismegisto January 25, 2025February 15, 2025

In etwa knapp einer Stunde mit Hilfe, von KI erstellt, allerdings besitze ich ausgiebige HTML, css, js, php Kenntnisse und weiß auch wie ich die Internetseite Hosten müsste. Ich kann jede Zeile Code lesen und ohne meine Kenntnisse hätte die Ki es auch nicht selbst geschafft. Mir hatte jemand gesagt er wolle so einen Shop,…

Exam & written exam

Endjahreszeugnis Note berechnen?

ByKally933k July 3, 2023February 20, 2025

Jahresleistung+ schriftliche Prüfungsnote: 4,8 insgesam 4,8 zählt 3-fach Laut meinem Lehrer durfte ich mündlich bis zu einer 3,0 sprechen, um die vier zu bekommen. Mündlich hab ich (2,0), was 1-fach zählt. Wie rechne ich die Note aus ? Also wenn ich weiß welche Zahl 3-Fach zählt und welche 1-Fach? 4,8 x 0,3 =1,44 1,44 x…

14 Answers

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

W00dp3ckr

1 year ago

So I’ll go with the funny advice…

You have files containing the numbers. There are 300,000 clear numbers across all files. You now want to read a file (File_1) and determine which unique numbers are inside. After that, you read the next (File_2) and determine which unique numbers it contains, which are not included in File_1… You do this until File_1,…,File_n contains all the different numbers. File_1,…,File_n becomes a group of files. Now the same game starts with File_(n+1), File_(n+2) and so on.

Many languages can be such a thing, for example Python.

Author

Ranjahh

1 year ago

Reply to W00dp3ckr

Yes, about 300,000 could be more or less.

The rest fits.

How do you build this? The combinations go into the zich zich billions!

Schachpapa

1 year ago

Reply to Ranjahh

the combinations go into the zich zich billions!

Why? 1-15 form a group, 16-30 the second, 31-45, 46-60, 61-75, 76-90, 91-100. I only come to 7 files with a maximum of 500,000 lines. This isn’t very special, and it’s going through in less than a minute.

Or you said you were wrong and mean something else.

W00dp3ckr

1 year ago

Reply to W00dp3ckr

So, if you want the file groups of 15 files to be optimal in the sense that they each contain the maximum number of numbers (some complete, others incomplete), then you need to build an index. The index contains the number in which files are present. This allows you to quickly answer the question of how many file groups you can create maximum that contains all numbers. You can also search “fail first”, i.e. you take the number that is most rare, take the first file that contains it, and then you will find the file that has the least overlap. So you go ahead until you have a group full. Then take the next unused document in the list.

This would perhaps lead to a “good” solution. Overall, I think the problem is NP-hart.

W00dp3ckr

1 year ago

Yeah, 15 out of 100 is a lot of banal. That’s why you need to be smarter. And that’s why you need to put yourself together with someone who can program and understand the problem you’re trying to solve.

Author

Ranjahh

1 year ago

It’s about combinations, not about fixed groups, 1-15 was just an example.

After combining 1-15 comes 1-14 + 16, then 1-14 +17 Etc..

But that’s all gone when 1-10 are full.

W00dp3ckr

1 year ago

It becomes more interesting if you have an optimality criterion. That would have to be an index.

Erzesel

1 year ago

trivial

I don’t make the effort [01][02][43] to convert into a number is simply sorting the strings You want to know only many elements left anyway….

Using millions of data lines from files is a “house number” in every language What the program does in memory is less the problem than reading the files!

Iterate via the desired files

add all lines of the current file to a buffer.
sort the lines without duplicates (depending on language: “sort unique“.Distinct(), etc
check with Array.Count/List.Count or ~.Size (different in each language) how many elements (lines) are in the list
If there are less elements than the specified value, you will notice the previously read in a 2nd buffer and continue with the next file
the default is exceeded

..the 2.Buffer you can then write into a file of your choice or whatever.

to get a block of 15 files

Instead of useless pseudocode…

Quick&Dirty in Powershel (Get-Content is not necessarily a racehorse when reading large files, but more clearly for a demo) :

$StopFlag=$False
$BaseNum=0
$WriteBuffer=@()
$BaseNum..($BaseNum+14)|
    ?{$StopFlag -eq $False}| #solange keine Abbruchbedingung
    %{$ReadBuffer=@()}{
        $FileName='file{0:d3}.txt'-f $_  #zu lesenden Dateinamen  aus de jeweils übergenebeb Dateinummer zusammenbasteln (file000.txt, file000.txt ...usw)
        Write-Host "Fuege $FileName zum Buffer"
        $ReadBuffer+=Get-Content $FileName -ReadCount 0
        $ReadBuffer=$ReadBuffer|Sort-Object -Unique
        Write-Host "Einzigartige Strings in ReadBuffer: $($ReadBuffer.Count)" -fo blue
        if ($ReadBuffer.Count -lt 500000) {
            Write-Host "ubernehme  ReadBuffer in WriteBuffer" -fo green
            $WriteBuffer=$ReadBuffer  #in 2.Buffer...Kopieren
        }
        else {
            $StopFlag=$True  # Loop abbbrechen
            Write-Host "Maximum Strings ueberschritten" -fo red
            Write-Host "ReadBuffer verworfen" -fo red
        }
    }
  $writebuffer.count

Schachpapa

1 year ago

I believe that if you can formulate your problem in a comprehensible way, the solution is no longer far away.

To load zb 100 text files. In these are about 300,000 lines with pay. In total there are only 500,000 different pays. Now I always want to add 15 files.

So 300,000 numbers per file? What criteria should 15 files be merged? What’s that?

Author

Ranjahh

1 year ago

Reply to Schachpapa

They are simply joined together. As if you would do all rows of all 15 files in a pure, double rows are overwrite

Schachpapa

1 year ago

Reply to Ranjahh

Now I always want to add 15 files.
If zb 6 files reach the full number of 500,000, it will not be further combined.

Whatever you want…

So you take file 1 with 300,000 numbers, add from file 2 all the numbers that are not already in, then file 3 etc. until you have either used file 15 or the target file contains 500,000 numbers? If this is reached at file 10, 11-15 will be ignored?

Do you set any requirements for the format of the target file? Or does it just contain numbers?

Author

Ranjahh

1 year ago

Yes, as you described.

The target file should just give back the numbers, so what’s in the lines.

They are not the numbers of 10 to 500,000 but zb of 1mio – 5mio. The number is specified by a file. Therefore, it would be best if an add file serves as a template for what the “frame” is.

So virtually sets the frame of 500,000, so it’s better to make it on line level.

Author

Ranjahh

1 year ago

Yes, as you described.

The target file should just give back the numbers, so what’s in the lines.

ralphdieter

1 year ago

I still don’t know what you want.

You obviously have a lot of D={ D_1, D_2, …} of files that can each be seen as a lot of D_i of lines. You know the amount Z of all possible lines. And now you’re looking for groups of 15 files whose association not Z is. You need:

any such group?
all possible groups? That could be a lot!
a breakdown of all files into such groups? Then, however, there is always a remainder because 100 is not divisible by 15.

I would process such amounts of data in Python and manage for each line a set of files that contain this line. These quantities can be efficiently reproduced as integers, each bit corresponding to a file:

from os import listdir

# lineinfo: line --> files (as Bit-Set)
lineinfo = { (line, 0) for line in open("all.txt") }

index = 1 # 2, 4, 8, ... 2^n
datainfo = {} # index --> filename
for data in listdir("input"):
    datainfo[index] = data
    for line in open(data):
        lineinfo[line] |= index # total count
    index *= 2

If after that any row occurs in at most 85 (=100-15) files, any 15 files from the rest are a possible solution. Otherwise (if all lines occur at least 86 times) there is no solution:

for line, lineset in lineinfo.items():
    int count = 0
    int idx = 1
    while idx= 15:
        int idx = 1
        while idx

A bit more difficult it becomes to enumerate all possible groups. Actually, each group of 15 files (as above) counts for each line. However, groups are counted several times if two or more lines are missing in their association. It is certainly not practical to remember all already calculated groups in order to avoid doublettes, because they are much too many.

In principle, you can ignore all rows that occur in 86 or more files (because they occur in each group), and then sort out the files that are in the cut of the other rows. This is likely to greatly reduce the amount of data.

However, how to proceed exactly depends on whether you want to distinguish two different file groups with the same amount of association:

If not, you can find iterative for any interesting line z_i all the solutions that z_i not and then only consider solutions containing z_1 to z_i. That should be really hard.
If you don't, you'd have to do something to don't have to iterate over all kinds of 15-some groups...

Similar Posts